eldentyrell

Legendary



Offline



Activity: 980

Merit: 1001





felonious vagrancy, personified







DonatorLegendaryActivity: 980Merit: 1001felonious vagrancy, personified Process-invariant hardware metric: hash-meters per second (η-factor) October 21, 2012, 10:25:38 PM

Last edit: July 22, 2013, 03:56:08 AM by eldentyrell Merited by vapourminer (1) #1 UPDATED 21-Jul-2013: added column showing delivery/verification status. "Verified" means by an independent third party. "Delivered" means at least a few have been sold in arm's-length transactions (i.e. not special favors to developers or reviewers).

UPDATED : 22-Jun-2013 changed BFL numbers from post-tapeout claim (7.5GH/s) to actual measurement (4GH/s).

UPDATED: 21-Jun-2013 added Bitfury 65nm 55nm figures (and fixed arithmetic error).



Known Figures



Design MH/s Device Process node, $\lambda$ Area η (H* p m/s) status

Bitfury 55nm 2 GH/s Custom 55nm, 27.5nm 14.44 mm2 2,880.45 verified Avalon 275 MH/s Custom 110nm, 55nm 16.13 mm2 2,836.52 verified , delivered BFL SC 4.0GH/s Custom 65nm, 32.5nm 56.25mm2 2,441.11 verified , delivered Bitfury Spartan-6 300MH/s Spartan-6 45nm, 22.5nm 120mm2 28.47 delivered Tricone 255MH/s Spartan-6 45nm, 22.5nm 120mm2 24.20 verified , delivered Ztex 210MH/s Spartan-6 45nm, 22.5nm 120mm2 19.75 verified , delivered BFL_MiniRig_1Card 1.388 GH/s 2 x Altera Arria II EP2AGX260 40nm, 20nm 306.25mm2 18.14 verified , delivered ATI 5870 393 MH/s Evergreen 40nm 334mm2 9.39 verified , delivered BFL_Single 832MH/s 2x EP3SL150F780 65nm, 32.5nm ? verified , delivered Block Eruptor ? Custom ?, ? ? conflicting data announced Reclaimer ? Custom ?, ? ? announced

I will list a chip in the table above when we have all of the following data:

Hashrate either in a claim from the manufacturer or measurement by a third party

Die size either in an unambiguous claim by the manufacturer or die photo from a third party

Process node in an unambiguous claim by the manufacturer

A plausible date by which independent verification will be possible.

Summary



As more and more announcements about bitcoin-specific chips come out, it would be useful to have a metric that compares the quality of the underlying design. I recommend "hash-meters per second" as a metric. This is calculated by dividing the hashrate (in H/s) by the die area in square meters and then multiplying by the cube of the process's feature size in meters (half of the process node's "name", so a 90nm process has a 45nm feature size). If you use hash-picometers instead of hash-meters you wind up with reasonable-sized numbers.



Current GPUs and FPGAs get 8-24 H*pm/s; the three ASICs we have numbers for have η-factors around 2,400-2,800 H*pm/s -- 100 times more efficient use of silicon than FPGAs and GPUs.



Migrating a design from one process to another by direct scaling -- when possible -- will not change this metric. Therefore it gives you a good idea of how the "rising tide" of semiconductor process technology will lift the various "boats".







Details



Process-invariant metrics factor out the contribution of capital to the end product, since the expenditure of capital can overwhelm the quality of the actual IP and give misleading projections of its future potential. A 28nm mask set costs at least 1000 times as much as a 350nm mask set, but migrating a design from 350nm to 28nm is not going to give you anywhere near 1000 times as much hashpower.



This metric probably does not matter for immediate end-user purchasing decisions -- MH/$ and MH/J matter more for that -- but for investors, designers, and long-range planning purposes it gives a better idea of how much "headroom" a given design has to improve simply by throwing more money at it and using a more-expensive IC process. Alternatively, this can be seen as a measure of how much of its performance is due to money having been thrown at it. That is important for investors -- and the line between presale-customers and investors is a bit blurry these days with all the recent announcements.



As semiconductor processes become more advanced, two important things happen:



1. The transistors get smaller (area).



2. The time required for transistors to turn on gets shorter (speed).





Area



Generally #1 (area) is indicated by the process name. For example, in a 90nm process the smallest transistor gates are 90nm long.



Chip designers refer to half of this length (i.e. 45nm on a 90nm process) as the feature size. The feature size is half of a gate length because you can always place transistors on a grid whose squares are at least half the length of the smallest gate. Usually you get an even finer grid than that, but it's not universally guaranteed.



Therefore, to get an area-independent measure of the size of a circuit, measure the circuit's area (units: square meters) and divide that by the square of the feature size (units: square meters) to get a unitless quantity. Well, almost unitless. Technically the units for a process's feature size are "meters per lambda" rather than meters, meaning the units for the final quantity should be (hash-meters) per (second*lambda-cubed).





Speed



Semiconductor processes are also characterized by a measure called "tau", which is the RC



The raw tau factor ignores the load presented by wires and other gates, so instead some desginers prefer to use This is also called the



Unfortunately the tau and FO4 numbers can be hard to come by, and they frequently get mixed up with each other (one is listed where the other ought to be). Also, there is a bit of "wiggle room" in exactly how the RC circuit or loading is done, so it's common to see inconsistent numbers cited by different sources for the same process. Because of this, using tau or FO4 directly in a competitive metric is a bad idea: people will fight over which tau or FO4 numbers to use. A



Fortunately, there is a fix. All we need here is a relative comparison of two circuits. It turns out that both tau and FO4 scale more or less linearly with the gate length (and therefore with the feature size). So instead of converting hashes/sec into hashes/tau or hashes/FO4 we can use the feature size as a proxy for the gate delay time and multiply the measure of hashes/sec by the feature size instead of multiplying by the tau/FO4 time. The resulting number will be totally meaningless as an absolute quantity, but the ratio of this metric for two different circuits will still give the ratio of their performance on equivalent processes.





Formula



So the forumla is:



(hashrate / area_in_square_lambda) * gate_switching_time



The units for this number are simply "hashes" (or "hashes per square lambda").



However remember that we're using feature_size (measured in meters per lambda) as a proxy for gate_switching_time since there is less wiggle room in how feature_size is measured and the two values tend to scale proportionally. This substitution gives us:



(hashrate / area_in_square_lambda) * feature_size



Since area_in_square_lambda is (area_in_square_meters / feature_size2) we can substitute to get:



(hashrate / (area_in_square_meters / feature_size2)) * feature_size



which is equivalent to



((hashrate * feature_size2) / area_in_square_meters) * feature_size



collecting the occurrences of feature_size gives us:



(hashrate * feature_size3) / area_in_square_meters



or alternatively:



(hashrate / area_in_square_meters) * feature_size3







Example



The Bitfury hasher gets 300MH/s:



300*106H/s

It runs on a Spartan-6, which a 300mm2 or 300*10-6m2die. Dividing the

hashrate by the area in meters gives:



1*1012H/(s*m2)

This is why the Bitfury hasher a convenient example -- out of coincidence its hashrate in H/s just happens to be the same as its die area in square millimeters. This makes the numbers simpler.



Multiplying the number above by the feature_size (22.5*10-9) cubed (11390.625*10-27 meters) gives



11390.625*10-15H*m/s

which is:



11.390625*10-12H*m/s

The SI units for 10-12 are "pico", so the Bitfury hasher gets



11.390 H*pm/s





Summary



To compute the metric, take the overall throughput of the device (hashes/sec), divide by the chip area measured in square meters and multiply by the cube of the process's feature size. Shortcut: take the hashrate in giga hashes per second, divide by the area in mm2, multiply by the feature size ( half the minimum gate length) in nanometers three times.



This number can then be used to project the performance of the same design under the huge assumption that the layout won't have to be changed radically. This assumption is almost always false, but assuming the design is ported with the same level of skill and same amount of time as the original layout, it's unlikely to be wrong by a factor of two or more. So I would consider this metric to be useful for projecting the results of porting a design up to roughly a factor of 2x. That might sound bad, but at the moment we don't have anything better. It also gives you an idea of how efficiently you're utilizing the transistors; once I get the numbers I'm looking forward to seeing how huge the divergence is between CPUs/GPUs/FPGAs/ASICs.



I propose to denote this metric by the greek letter η, from which the latin letter "H" arose. "H" is for hashpower, of course. Here is a table of some existing designs and their η-factor (I will update this periodically):



This metric does not take power consumption into account in any way. I believe there ought to be a separate process-independent metric for that.



If anybody can add information to the table, please post below. Getting die sizes can be difficult; I know the Spartan-6 die size above is a conservative estimate (it definitely isn't any bigger or it wouldn't fit in the csg484).[/list][/list] 21-Jul-2013: added column showing delivery/verification status. "Verified" means by an independent third party. "Delivered" means at least a few have been sold in arm's-length transactions (i.e. not special favors to developers or reviewers).: 22-Jun-2013 changed BFL numbers from post-tapeout claim (7.5GH/s) to actual measurement (4GH/s).UPDATED: 21-Jun-2013 added Bitfury55nm figures (and fixed arithmetic error).I will list a chip in the table above when we have all of the following data:As more and more announcements about bitcoin-specific chips come out, it would be useful to have a metric that compares the quality of the underlying design.This is calculated by dividing the hashrate (in H/s) by the die area in square meters and then multiplying by the cube of the process's feature size in meters (half of the process node's "name", so a 90nm process has a 45nm feature size). If you use hash-meters instead of hash-meters you wind up with reasonable-sized numbers.Current GPUs and FPGAs get; the three ASICs we have numbers for have η-factors around-- 100 times more efficient use of silicon than FPGAs and GPUs.Migrating a design from one process to another by direct scaling ---- will not change this metric. Therefore it gives you a good idea of how the "rising tide" of semiconductor process technology will lift the various "boats".Process-invariant metrics factor out the contribution of capital to the end product, since the expenditure of capital can overwhelm the quality of the actual IP and give misleading projections of its future potential. A 28nm mask set costs at least 1000 times as much as a 350nm mask set, but migrating a design from 350nm to 28nm is not going to give you anywhere near 1000 times as much hashpower.This metric probably does not matter for immediate end-user purchasing decisions -- MH/$ and MH/J matter more for that -- but for investors, designers, and long-range planning purposes it gives a better idea of how much "headroom" a given design has to improveand using a more-expensive IC process. Alternatively, this can be seen as a measure of. That is important for investors -- and the line between presale-customers and investors is a bit blurry these days with all the recent announcements.As semiconductor processes become more advanced, two important things happen:1. The transistors get smaller (area).2. The time required for transistors to turn on gets shorter (speed).Generally #1 (area) is indicated by the process name. For example, in a 90nm process the smallest transistor gates are 90nm long.Chip designers refer toof this length (i.e. 45nm on a 90nm process) as the feature size. The feature size is half of a gate length because you can always place transistors on a grid whose squares are at least half the length of the smallest gate. Usually you get an even finer grid than that, but it's not universally guaranteed.Therefore, to get an area-independent measure of the size of a circuit, measure the circuit's area (units: square meters) and divide that by the square of the feature size (units: square meters) to get a unitless quantity. Well, almost unitless. Technically the units for a process's feature size are "meters per lambda" rather than meters, meaning the units for the final quantity should be (hash-meters) per (second*lambda-cubed).Semiconductor processes are also characterized by a measure called "tau", which is the RC time constant of the process. This is the time it takes a symmetric inverter to drive a wire high or low, assuming the wire has no load.The raw tau factor ignores the load presented by wires and other gates, so instead some desginers prefer to use This is also called the FO4 or the normalized gate delay. FO4 is the same measurement, but each gate drives four copies of itself.Unfortunately the tau and FO4 numbers can be hard to come by, and they frequently get mixed up with each other (one is listed where the other ought to be). Also, there is a bit of "wiggle room" in exactly how the RC circuit or loading is done, so it's common to see inconsistent numbers cited by different sources for the same process. Because of this, using tau or FO4 directly in a competitive metric is a bad idea: people will fight over which tau or FO4 numbers to use. A previous proposal used gate delays as part of the metric, but I no longer recommend that metric since if it were to gain popularity it would inevitably lead to people playing games with the tau/FO4 numbers, picking and choosing whichever number cast their favorite product in the best light.Fortunately, there is a fix. All we need here is acomparison of two circuits. It turns out that both tau and FO4 scale more or less linearly with the gate length (and therefore with the feature size). So instead of converting hashes/sec into hashes/tau or hashes/FO4 we can use the feature size as a proxy for the gate delay time andthe measure of hashes/sec by the feature size instead of multiplying by the tau/FO4 time.So the forumla is:The units for this number are simply "hashes" (or "hashes per square lambda").However remember that we're using feature_size (measured in meters per lambda) as a proxy for gate_switching_time since there is less wiggle room in how feature_size is measured and the two values tend to scale proportionally. This substitution gives us:Since area_in_square_lambda is (area_in_square_meters / feature_size) we can substitute to get:which is equivalent tocollecting the occurrences of feature_size gives us:or alternatively:The Bitfury hasher gets 300MH/s:It runs on a Spartan-6, which a 300mmor 300*10die. Dividing thehashrate by the area in meters gives:This is why the Bitfury hasher a convenient example -- out of coincidence its hashrate in H/s just happens to be the same as its die area in square millimeters. This makes the numbers simpler.Multiplying the number above by the feature_size (22.5*10) cubed (11390.625*10meters) giveswhich is:The SI units for 10are "pico", so the Bitfury hasher getsTo compute the metric, take the overall throughput of the device (hashes/sec), divide by the chip area measured in square meters and multiply by the cube of the process's feature size.This number can then be used to project the performance of the same design under thethat the layout won't have to be changed radically., but assuming the design is ported with the same level of skill and same amount of time as the original layout, it's unlikely to be wrong by a factor of two or more. So I would consider this metric to be useful for projecting the results of porting a design up to roughly a factor of 2x. That might sound bad, but at the moment we don't have anything better. It also gives you an idea of how efficiently you're utilizing the transistors; once I get the numbers I'm looking forward to seeing how huge the divergence is between CPUs/GPUs/FPGAs/ASICs.I propose to denote this metric by the greek letter η, from which the latin letter "H" arose. "H" is for hashpower, of course. Here is a table of some existing designs and their η-factor (I will update this periodically):This metric does not take power consumption into account in any way. I believe there ought to be a separate process-independent metric for that.If anybody can add information to the table, please post below. Getting die sizes can be difficult; I know the Spartan-6 die size above is a conservative estimate (it definitely isn't any bigger or it wouldn't fit in the csg484).[/list][/list] The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague .