Sharding centralizes Ethereum by selling you Scaling-In disguised as Scaling-Out

12,133 reads

@ stopanddecrypt StopAndDecrypt Byzantine Fault Tolerance Abstractionist

The differences between Light-Clients & Fully-Validating Nodes

Are you wondering what that even means? Don’t worry, you will. Go make a cup of coffee first, it’s another long one. If you haven’t read this yet, start here:

The following is an exchange I had with Vitalik following the publication of the article above. Although initiated by his response to my article, the purpose of the exchange from my perspective became to extract the underlying reason him and many other Ethereum fans don’t see sharding as diminishing the integrity of the network it’s being applied to. Fortunately he was cooperatively replying to the questions I asked, which were intentionally done to break down the logic so we could ultimately arrive to this semi-expected response:

I meant nodes. He knew what I meant and responded accordingly.

I’m highlighting this because it sums the entirety of this article pretty well.

Vitalik: “This is the way Ethereum is now, and Sharding will be no different when we switch to it, except that some things will be easier for some people.”

Me: “I don’t agree with that at all, and here’s a full-length counter-response.”

The aim of this article will be to explain the following information and their nuances in a digestible format (in no specific order) :

Differences in network architecture between Bitcoin, Old-Ethereum, and New-Ethereum (2.0). Very basic math, no fancy equations. Just a bit of logic.

Blockchain-based networks, depending on their architecture, will inherently centralize or decentralize over enough time. This excludes bootstrapping & hype-cycles periods (however long they may be), and assumes no changes to the protocol are made. We can refer to this as the network’s “inherent direction”.

centralize or decentralize over enough time. This excludes bootstrapping & hype-cycles periods (however long they may be), and assumes no changes to the protocol are made. We can refer to this as the network’s “inherent direction”. How changes to the protocol can have immediate or intermediate effects on the networks [de]centralization, but not necessarily change the “inherent direction” of the network over the long-term.

Ethereum is inherently centralizing as it stands now, switching to Proof of Stake doesn’t change that, and sharding only delays the inevitable by selling you a faulty scaling solution.

It’s going to be long, but easy. If you care about understanding this, read it.

Index

Conceptual Primer

Network Differences

Sharding

Counter Replies

Scaling-What? Scaling-Who? Scaling-Where?

Scaling… …When?

Here is Vitalik saying Sharding is a Layer 1 scaling solution:

Note that scaling is used here without context, so let’s provide some context. We can’t keep going around using the term scalability without an agreed upon definition of what the word even means. Even if the definition is different for different people, the one making an argument should at least provide theirs.

When I talk about scaling in this article, I’m talking about one thing and one thing only, increasing functionality without sacrificing decentralization, and the total set of validating nodes is one of the most direct representations of how decentralized the network is. Focusing on anything else when discussing scaling in regards to blockchain networks is either a result of not properly understanding this, disagreeing with it, or an act of intentionally misleading for whatever reasons one may have to do so.

It’s important to make that clear first, because understanding the differences in the following types of scaling and how they apply to decentralized networks require that baseline.

Scaling Out: Increasing the number of fully-validating nodes on the base layer (L1) of the network, or , increasing functionality at the base layer without affecting the “inherent direction” of the network. Increasing functionality increases incentives to use the network, thus increasing the set of validating nodes. So in both cases, the validating node count increases by some degree.

Increasing the number of on the base layer (L1) of the network, , increasing functionality at the base layer without affecting the “inherent direction” of the network. Increasing functionality increases incentives to use the network, thus increasing the set of validating nodes. So in both cases, the validating node count increases by some degree. Scaling Up: Adding layers to the network above the base layer. Scaling up can be done correctly or incorrectly. The right way to do it, is to have little to no effects on the layers below it. Payment channel networks like Lightning are an example of good scaling-up. Sidechains are arguably an example of bad scaling-up, because they can potentially break the mining security model, but this article isn’t about that.

Conversely there’s:

Scaling In: Decreasing the number of fully-validating nodes. Adding functionality at the base layer that reduces the number of full-nodes is scaling-in. Sharding does this, and it’s the reason I’m writing this article.

Decreasing the number of fully-validating nodes. Adding functionality at the base layer that reduces the number of full-nodes is scaling-in. Sharding does this, and it’s the reason I’m writing this article. Scaling Down: Decreasing layer complexity. This is unnecessary in a decentralized network. This term is usually used in centralized systems like companies trying to downsize or reorganize their internal structure.

The term “node” gets thrown around a lot but more often than not remains undefined to an outsider trying to follow along. I have a friend who tried telling me that Nano lets everyone run a node and have their own blockchain. People just don’t know what they’re talking about, and it’s because the right information isn’t readily available for them. Increasing the node count is a meaningless endeavor if they aren’t the right kinds of nodes. So when I say you’re being sold a scaling-in solution, it’s because the important kinds of nodes are going to go down as a result of the change. Not necessarily right away, but over time, and I’ll touch on why that’s important as well.

The importance of nodes that fully-validate

https://twitter.com/lopp/status/1001903809004736513 /// Read my BCash article if you’re interested in why the “Bitcoin” Twitter account blocked me.

The first thing I want to do is make a simple case for why these are important, then I’ll present you with the common arguments against it. I’ll respond to those arguments briefly, and then go in depth into how the Bitcoin network actually works later in this article, and then you’ll be able to see just from my explanation where these anti-node arguments fall short.

Just a heads-ups: If you came here to read about Ethereum and you don’t want to learn about how Bitcoin’s network functions, then you deserve to lose any money you’ve invested, or any time you’ll waste developing for Ethereum.

My case here is pretty simple:

In Bitcoin, all nodes fully validate. If someone wants to submit anything that’s not valid, it’s not going to circulate around the network. Every node is a white blood cell. There is no better security than this for a blockchain network.

Some common arguments used against the pro-node stance are:

Your nodes don’t enforce consensus unless they are “mining-nodes”, and your nodes get in the way of “mining-nodes”.

Edges matter more than nodes for decentralization so increasing node count doesn’t always net you more decentralization.

How many nodes are needed to be secure? Surely there’s a “good enough” quantity of nodes that can be reached?

1: “Nodes don’t enforce consensus unless they are mining.”

This one I’ll respond to by diving straight into how Bitcoin’s network works. You’ll see very easily at the end of the “Bitcoin Network Topology” section how it answers this. I also published that section as a standalone article because of how rampant misinformation is spread on this subject alone. (This will all circle back to Sharding and Ethereum, trust me.)

2: “Edges are more important than nodes.”

Just to make this clear for the rest of the article, when I say decentralized I mean (c) in the following diagram (but in “4D” not 2D) that I stole from Vitalik, who used it to write about how bad this picture is:

Edges are just the connections from one node to another. The following diagrams are networks of 16 nodes each. Same amount of nodes, but one has much less edges. The other one has every node connected to every other node.

The difference between these?

The first one has enough edges to propagate (we’ll get further into this later) to every node on the network in a sufficient amount of “hops” and none of the nodes are censorable because the connections (edges) are properly distributed.

The other network?

Everyone knows who you are.

Everyone knows if the transaction you sent was created by you or not.

Everyone knows the block a node relays was created by that node or relayed through that node from a potential miner they don’t like.

Now they know that IP address holds an address with a lot of Bitcoin on it.

They know it was a miner, or they know what IP address is allowing an unwanted miner to relay those blocks.

Anyone who controls the limited amount of validators can censor you, or any block your transactions are included in.

It’s the opposite of private, and it’s the opposite of secure.

In respect to edges, there is definitely an “enough” amount, as you’ll see later. I also talk about propagation in my prior article and it’s only getting better despite these “limited” amount of edges. It’s mostly an argument used by scam artists (who even Vitalik frequently calls out) that want to sound smart when they’re scamming :

He also comparing a Layer 1 network to a Layer 2 network here, like an idiot.

3: “How many nodes are needed?”

The “how many is enough to secure” argument is probably one of the biggest signs, in my opinion, of someone not getting the bigger picture. By asking this question they’ve already agreed to the condition that there needs to be some to be considered secure, and now they are just trying to take the argument towards figuring that value out. Sometimes they’re genuine, other times they’re diverting the subject.

Here’s some simple questions I don’t necessarily expect you to have an answer to. Let’s say we both determine and agree that number is 20,000 full-nodes.

How do we code the protocol to ensure the node count stays above and doesn’t drop below 20,000 nodes?

Even if we figured that out, how does the network know how big or small it is without an external oracle telling it that it’s at that number?

We agreed it’s 20,000 nodes, but why did we agree on 20,000? Did we measure something in “real life” and determine we needed 20,000 “people” to prevent government takeover? It’s a silly example, but what happens if the population multiplies by 10? Is that 20,000 number still accurate?

If we decide on some value now, and determine that was a bad value in the future, we then have to go through the process (and bear the risk) of trying to change the protocol to adjust to the new value. Is that worth it?

My point here is, there is no way to do this. You can’t code the network to “hover” at a certain node count. I’m very inclined to believe the following, and I’ve seen no good arguments against this: Any blockchain network, if the protocol is left unchanged and demand continues to grow, will decentralize or centralize over time depending on how they are built. So if decentralization is a feature you want, the protocol needs to be inherently decentralizing. This means that the protocol needs to be designed in a way that ensures the validating node count will grow over time. If it’s ensured growth, it’s not ensured shrinkage, which brings me to the next section.

Inherent Centralization/Decentralization

I want to begin this section by clearing up some of the confusion regarding how these chains compare. There’s some admitted salt about the methods of comparison I and others have used in the past, so I aim to show you a fairer comparison that actually makes it all look worse.

The Bitcoin blockchain “size” grows linearly because the data a node needs to process at max block capacity is static over time. So when you see the first chart below, it’s equal to the second chart.

A single blocksize increase would result in a node needing to process more data (but static) over time, and faster chain growth (but linear) over time. Never-ending blocksize increases make both of them exponential.

Hopefully the following is clear, but I’ll elaborate further:

Inherent Decentralization: Validators go up

Blocksize cap = Linear chain growth

Linear chain growth =Static node requirements

Static node requirements = Node count goes up over time

Inherent Centralization: Validators go down

No cap = Exponential chain growth

Exponential chain growth = Growing node requirements

Growing node requirements = Node count goes down over time

With a set blocksize that never goes up, as technology grows it becomes easier and easier to run a node, thus the total node count will go up over time. This is what I mean when I say “inherently decentralizing”. When Bitcoin upgraded to Segwit, the requirements to run a node that does full-validation did go up, but only marginally. It didn’t kick anyone else off the network or make it harder for people running pre-Segwit nodes, but most importantly, it remained inherently decentralizing:

When the size was changed for Segwit, it was done for other reasons than arbitrarily “adding space”. Right now the Bitcoin blocksize is regulated, the cap is set, and it’s not being changed. If a block is too large it’s invalid. This is ideal because it ensures a static volume of data over time. Nobody votes on this, it’s not a 51% vs. 49% situation. It’s always invalid if it’s too big.

The network has no way to tell if your node mined a block, so the protocol enforces privacy equality in that sense, but the blocksize cap enforces physical equality, in the sense that there is zero differentiation between validators whether they mine or not (more on this later). Their ability to process transactions doesn’t segregate them because it’s easy for everyone’s node. Removing the blocksize cap separates these nodes into tiers, where one group has the power to cut the others off with force by creating blocks that shut down the other half of the network, destroying the network in the long-run.

Changing Bitcoin’s design to allow a variable blocksize would result in:

Undesirable sizes are now valid.

Miners start creating blocks without worrying about the effect it’ll have on other validators that may or may not mine.

The requirements to run a node keep growing, so the count keeps shrinking.

More variables means more ways to differentiate & segregate types of nodes.

I also want to point out that it doesn’t matter if the cap is set at 2 MB, or 8 MB. At some point technology will allow for that blocksize to be viable, but that’s another debate because we could set the blocksize to 50 terabytes now and “just wait for it to catch up”, but Bitcoin will have become centralized and ruined by that point. Where the cap should be is a different debate, but my only argument right now is there needs to be a hard-limit that doesn’t change over time where blocks are invalid if above it.

This leads me into the next example: Ethereum and its arbitrary “gas limit”:

Without a proper cap like in the chart above, or in Ethereum, the blocksize keeps growing and technology can never grow at a fast enough rate to catch up effectively so that you’ll be able to continue running your node. This is what I mean when I say “inherently centralizing”. Unbounded growth requirements determined by a small group of centralized actors is not good. Even with Sharding, this limit will increase over time. Sharding might be temporarily successful in splitting up the work, but Ethereum’s inherent direction is south:

An Ethereum block’s size is determined by the miners, who set the gas limit for that block. If you don’t understand Ethereum’s gas limit, here’s a very simple explanation that may rustle the jimmies of some technical people:

In Ethereum, instead of bytes, a block is limited to how many units of gas it can have. For this example, let’s say it’s 1000 units of gas.

When you want to create a transaction, or a “contract”, it costs gas to process. Let’s say your transaction cost 2 gas and mine costs 5. We can both fit into a single block, along with 993 more units of gas worth of transactions.

So when a miner makes a block, they’re limited to only including 1000 units of gas worth of transactions or the network deems it invalid.

Except they’re not limited to 1000 units…

They can make a block with 1200 units. Then they can make a block with 1500 units. The consensus rules let them increase it gradually without it being invalid. Other miners can make smaller blocks, which helps bring the average (and thus, the limit) down, but these are only other miners. If you’re operating a fully validating node under these kinds of consensus rules you have no ability to decide this metric. Miners are a tier above all other nodes on this network, and Vitalik doesn’t even deny that.

Because of this differentiation in nodes by code within the Ethereum network, Ethereum is inherently centralizing. This is the fundamental difference between Bitcoin and Ethereum’s network properties as they currently exist. Ethereum’s set of fully validating nodes doesn’t have equal voting rights because their external abilities allow them to change the protocol, which affects other nodes. In Bitcoin there are no voting rights that affect your ability to run your node.

If you leave Bitcoin alone (and that’s the plan), its fully-validating node count will increase over time.

If you leave Ethereum alone, its fully-validating node count will decrease over time.

I think it’s important to note here that without state-channel network technology like Lightning, my statement wouldn’t hold true for Bitcoin. Inherent properties like this aren’t limited to just node growth/shrinkage. I also think that even though state-channel networks can exist for Ethereum as well, Ethereum has a much deeper issue at hand.

The next two sections will be detailing Bitcoin’s network topology to elaborate further on this concept, and then the differences in Ethereum’s network directly due to this hierarchy of who can and can’t vote.

Bitcoin’s Network Topology

Bitcoin is more than just a chain of blocks, and I want to help you understand how Bitcoin’s blockchain network is designed first because it’s the simplest one of the bunch, and there are fundamental attributes to its simplicity that you need to understand for the rest of this article. I say blockchain network because Bitcoin also has a payment channel network (lightning) layered on top of it that doesn’t effect the structure of the blockchain network. I won’t be discussing Bitcoin’s lightning network in this article though, as it’s not that relevant to the points I’ll make.

Below is a rough example of the Bitcoin network scaled down to 1000 fully validating nodes (there’s really 115,000 currently). Each node here has 8 connections to other nodes, because this is the default amount of connections the client makes without any changes made to it. My node is in here somewhere, and if you’re running one, it’s in there too. Coinbase’s nodes are in there, Bitmain’s nodes are in there, and if Satoshi is still around, Satoshi’s node is in there too.

Please note that this is just a diagram, and that the real network topology can (and probably does) vary from this. Some nodes have more than the default amount of connections while others may opt to connect to a limited number or stay behind just one other node. There’s no way to know what it actually looks like because it’s designed with privacy in mind (although some monitoring companies certainly try to get very close approximations) and nodes can and do routinely change who their peers are.

I started with that diagram because I want you to understand that there are no differences in these nodes because they all fully validate. The ones on the inside are no different than the ones on the outside, they all have the same amount of connections. When you start up a brand new node, it finds peers and becomes one of the hive. The longest distance in this graph from any of these nodes to another is 6. In real life there are some deviations to this distance because finding new peers isn’t a perfectly automated process that distributes everyone evenly, but generally, adding more nodes to the network doesn’t change this. There are 6 degrees of Kevin Bacon, and in 6 hops my transaction is in the hands of (almost) every node, if it’s valid.

I’m going to select “my” node from this group and drag it out, so I can demonstrate what happens when I create a transaction and announce it to the network. Below you’ll see my node all the way to the right, and then you’ll see the 8 other nodes (peers) that mine is connected to.

When I create a transaction and “send it out to the world”, it’s actually only to going these 8 peers. Since Bitcoin is designed from the ground up to make every node a fully validating node, when these 8 nodes receive my transaction they check to see if it’s valid before sending it out to their 8 peers. If my transaction is invalid it will never break the “surface” of the network. My peers will never send that bad transactions to their peers. They actually don’t even know that I created that transaction. There’s no way for them to tell, and they treat all data as equal, but if I were to keep sending invalid transactions to any of my 8 peers, they would all eventually block me. This is done by them automatically to prevent me from spamming my connection to them. No matter who you are, or how big your company is, your transaction won’t propagate if it’s invalid.

Now let’s say you’re not running a full-node, but you’re using a light-client instead. Various light-clients exist for the desktop, and for your mobile phone. Some of them are Electrum, Armory, Bread, and Samourai Wallet. Light-clients tether to a specific node. Some can be set up to change the one they connect to over time, but they are still ultimately tethered. This is what tethering looks like:

The reason I’m showing you this will become more apparent further on in this article, but I want you to note that this is just a diagram, and it’s easy to demonstrate tethering using a node that happens to be on the rim, but there is no real rim, and tethering is tethering wherever that node happens to be within this diagram. I’ve highlighted this in yellow. The nodes being tethered to are green, and the blue dots are light-clients. All information going to or coming from the light-client goes through the node they’re tethered to. They depend on that node. They are not part of the network. They’re not nodes. Remember this, because in Ethereum their behavior is slightly different, but their effect on the network is the same: nothing.

Here’s where it gets fun, and where other people try to misrepresent how the network actually works: What if I wanted to start mining?

Mining a block is the act of creating a block. Much like a transaction you want to send, you must create the block and announce it to the network. Any node can announce a new block, there’s nothing special about that process, you just need a new block. Mining has gotten increasingly difficult, but if you want you can purchase specialized hardware and connect it to your personal node.

Remember that bit about invalid transactions? Same goes for blocks, but you need to understand something very specific about how blocks are created.

First watch this video. I skipped to the important part about hashing, using nonces (random value) and appending the chain with that new block header:

Please watch the entire thing if you have time. It’s personally my favorite video explaining how mining works.

When you get to the following part in the video where the labels “Prev hash” are applied, those are the block headers:

What’s not mentioned in this video is you can create valid blocks headers even if all the transactions inside the block are invalid. It still requires the same amount of time to mine blocks with invalid transactions as it does to mine a block with valid transactions. The incentive to spend all that time and energy creating such a block would be to push through a transaction that rewards you with Bitcoin that aren’t yours. This is why it’s important that all nodes check not just the block headers, but the transactions as well. This is what stops miners from spending that time. Because all nodes check, no miners can cheat the system. If all nodes didn’t check you’d have to rely on the ones that do check. This would separate nodes into “types”, and the only type that would matter would be the ones that check. Ethereum does this currently, and I’ll touch on that in the next section.

So what if you join a mining pool? You might do this because mining is too difficult for you to do alone, or if you’re a slightly larger entity you might prefer a steady income as opposed to a sporadic one. Many miners do this, and they connected their specialized hardware directly to a mining pool using an entirely different protocol call the Stratum mining protocol. Just like creating a transaction with your non-node cellphone, you don’t have to run a node to connect your hardware to a mining pool. You can mine without running a node, and many miners do exactly that. Here’s what that looks like below in blue. I’ve used Slush Pool for this example:

Remember, I dragged these pool-run nodes out of the diagram for demonstration purposes. Just like any other node, these pool-run nodes need peers. They need peers to receive transactions & blocks, and they need peers to announce blocks they create. Allow me to reiterate again: all nodes validate all blocks and transactions. If any of these pools announce an invalid block, their peers will know because they fully-validate, and they won’t send it out to other nodes. Just like transactions, invalid blocks do not enter the network.

Here’s another way to look at this without pulling these nodes out from the diagram. Below is a private miner who doesn’t want to be known, it has 8 random peers, and none of those peers knows that it’s a miner. Again, this is intentionally designed this way for privacy reasons. There’s no way for any node in the network to know that the block they received was created by their peer, or relayed by their peer. All they know is if it’s valid or not, and if it is they send it along, if it’s not, they don’t.

Hopefully you’re getting the picture, and I don’t believe I used any fancy math or equations to get here. I’d like to move on because I feel like this is complete coverage, but there is one final thing I’d like to address because it’s this final aspect that is used to confuse others who don’t fully understand everything I just explained. It’s so rampantly used that I need to address it.

My original comment was talking about light-clients, also called SPV clients, and how they aren’t part of the network. I demonstrated this above with the blue tethered dots. His follow-up comment tries to imply that nodes that mine are the only nodes whose rejection matters. Remember: nodes have no way of knowing which other nodes mined a block versus who relayed a block, this was designed intentionally.

Now for a final diagram so I can try and explain the logic that’s used when people say “only mining nodes matter”. Some miners connect directly to other miners so that out of their peer list with the network, some of them are also other miners. Not all miners do this. Some of these miners that connect directly also use optional relay networks like the FIBRE network being designed by Bitcoin Core developer Matt Corallo, but even this side-network isn’t exclusive to miners, anyone can join including you or me and it’s just there to help block relay across the network. Either way, people try to argue that this interconnectivity of nodes that mine (whether using something like FIBRE or not) implies they’re the only ones that matter, and it’s absurd:

In this example I left the node’s peers inside the diagram. You should get the point by now. They reject invalid blocks. That group of nodes inside the green circles are most definitely not the only set of nodes that matter in this network, and with that being said, I think I’ve covered everything you’ll need to know about Bitcoin’s blockchain network for me to move on to Ethereum’s.

Ethereum with Proof of Work

This one’s going to be relatively the same with a few key differences. The biggest takeaway out of all of this is that your fully-validating node can’t reject blocks based on their size or the gas limit. Having no throttle on this external procedure puts pressure on these fully-validating nodes to process that information at a pace they may not be able to keep up with, reducing the amount of nodes over time and skewing the node set towards much larger entities.

Much like Bitcoin, Ethereum currently uses a Proof of Work system for its blockchain appending process & token distribution process. Since the intended function of the Ethereum blockchain network is different, the data that is put inside a block is also different. This won’t be about the kind of data, “smart contracts”, or anything of that sort. This will just be about the volume of that data, and the network topology.

The following diagram, like the Bitcoin one, is just a visual and not the actual topology. Instead of every node having an even distribution of peers, I’ve put the number of peers per node on a curve, because it’s well known and admitted that Ethereum is having peer issues since the node count keeps dropping, and “good” peers that serve sufficient amounts of data are hard to come by these days.

That’s what a “decentralized” network looks like when the good peers are limited in number, and this becomes problematic when people trying to sync up a new Ethereum node can’t because there’s just not enough peers seeding the data they are asking for. You get a small group of highly connected peers serving all the other ones the blockchain. This is very bad for a broadcast network. What’s even worse is the gas limit (and in return the total blocksize) keeps going up because there’s no restriction on it, putting more strain on these limited nodes and shrinking the amount that exist, despite claims that “the gas limit hasn’t moved in X months”:

The gas “limit” isn’t a limit and like I mentioned earlier, miners choose this at their leisure. The important takeaway is Ethereum nodes don’t reject blocks no matter what the gas limit is. This is one of the fundamental differences between Bitcoin and Ethereum. Nodes aren’t set up to prevent external pressures from centralizing them with data that has no regulation. Miners are not increasing this limit now for altruistic reasons, and because Vitalik is telling them not to. Sounds decentralized, right? This is not how you want a blockchain to function. What’s going to happen when the fees get too high?

Take Vitalik’s response, and the following blocksize chart, as you will:

Ethereum has 2 options:

If the gas limit doesn’t go up there won’t be enough room for all of the contracts, a bidding war will begin so fees go up. If fees go up too much, basic contracts start becoming too expensive, and Dapps that function with low fees stop functioning. The last time this happened they were forced to raise the limit, because they need Dapps to work.

If the gas limit is raised, then the already peer centralized node count goes down even more.

Fortunately, Ethereum has light-clients just like Bitcoin, and they’re here to save the day if you can’t run your own validating node…

Remember how I demonstrated earlier that SPV clients (that only sync headers) are tethered to a specific node and not actually part of the network? In Ethereum they took that a step further and created a “sub-network” for these light-clients, where they can share block headers. If you didn’t know, it turns out most people don’t actually run fully-validating nodes in Ethereum (for various reasons), they actually run light-clients.

https://github.com/ethereum/go-ethereum/issues/15454 /// You can also refer to my prior article.

They’ve been having some issues getting enough full-nodes to supply the light-clients with the block headers they need. Light-clients can’t stay peered with each other because people are too lenient turning them off and on, so they become even more dependent on the full nodes who voluntarily give them that data. In Bitcoin there’s no volunteering, all full-nodes perform the same relaying functions, and it’s easy to do. All in all, I actually don’t think there’s anything wrong with having a subnet for light-clients. I think anyone who wants to run one should be able to. I think having a subnet of them is a good thing, at best more people wouldn’t have to trust specific nodes for headers, at worst, they can’t adequately meet their own demand they create. The issue is when the developers start calling these “nodes” and the community is led to believe that they contributing to a network. They’re not “nodes” and they do nothing for the network.

And the Ethereum developers do call them nodes. The following is about sharding, which I’ll get into next, but theyshouldn’t go around telling the community that the light-clients they’re running are nodes. Then they get node counts that keep going up, but all that’s really happening is the light-client count goes up while the full-node count slowly drops. Calling these nodes disguises the issue.

Hopefully I’ve drilled it in by now. Verifying block headers do nothing for the network.

So this is a more accurate model of what the network looks like:

Seeing that, what do you think now when you see this total “node” count? Are they discerning between these nodes? Did you understand the difference prior to these articles? Even if they aren’t including the light-nodes, what’s going to happen over time?

Over time, even though that total “node” count might be going up, this is what happens to a network that is inherently centralized and doesn’t pay mind to its fully-validating set of nodes: It gets worse.

Not only does the network start dropping in validator count, the miners begin to connect directly to each other out of necessity to avoid bad uncle/orphan rates. Uncles/Orphans are dropped blocks that occur because block times are too close to each other. As different miners make valid blocks at the same time you end up with two valid chains. Eventually one of those blocks is built on top of and the other block is orphaned.

In this diagram the purple blocks are orphans.

Do you know who loses out the most when their blocks are dropped because the network selected a different branch to follow? The smaller miners, further centralizing the network because they can’t handle the income volatility.

So now you have:

A consistently shrinking validator total

A community turning to light-nodes because they can’t run validating ones

A development group that tells the community this is okay because they never were “meaningful validators” to begin with

An upcoming fundamental change to the network structure that shrinks this validating set even further, where you need 32 ETH just to be one

The responses to raising this subject are either agreed upon concern, or complete dismissal of the issue. When people dismiss this issue, they typical use the “non-mining” tactic we already went over. They say “all those nodes in the middle that are shrinking, they were never doing anything anyway unless they were mining/staking.”

Is it really the least comprehensible argument you’ve heard all this month?

Recap:

In Bitcoin all nodes validate, and none have any greater say because the blocksize is capped and enforced by all of them.

In Ethereum nodes are split into full & light versions, and only the full nodes validate. Full nodes don’t hard-cap the gas limit, which results in them having to process more data over time, shutting many of them down.

Ethereum with Proof of Stake

Much like Ethereum with Proof of Work, all of the above applies with Proof of Stake. The intention is to launch PoS along with Sharding at the same time so I don’t think Ethereum with just PoS will occur. This is just to highlight the stepping stones of centralization as we get closer to sharding. In addition to everything in the last section, PoS brings about these additional issues:

No mining means no external cost. Validators that stake just need to stake their coins and host their server. As they earn money for doing nothing they can continue to upgrade their servers to compensate for the ever increasing node requirements, while everyone else gets left behind.

Staking requires 32 ETH ($16,000 as of now), so not only is the set of validators continuously decreasing, but those who have $16,000 at their disposal to stake don’t care about the data processing requirements. So this will only accelerate the data throughput growth

In this network, it’s structurally the same as the Ethereum diagram I provided below, but this time I’ve highlighted the nodes that stake, so you can see the progression of the validating set of nodes over time in ratio to the validators that are also staking. Remember, Vitalik said it was always like this to begin with, and above I explained how that’s technically incorrect. Either way, with PoS this process is accelerated:

At its peak price, you would have hypothetically needed $45,000 to be one of those nodes. Pooling funds doesn’t change anything, the pool runs the node, not you. Fortunately PoS is coming with Sharding bundled in so we can end this section here.

Ethereum (2.0) with Proof of Stake + Sharding

As the title states, Sharding introduces scaling-in while making you believe it’s helping Ethereum scale-out. As you can imagine it has much to do with the validating node count, but with a twist. Validating responsibilities are split up among various groups, each with their own shard. The intent is to relieve the amount of work a single validating node must do so there can be more of them, but it only results in prolonging the issue, and not fixing the problem. Furthermore, there’s now a huge cost for some of these nodes, as staking is required to be one of them.

We won’t be having Bitcoin full-nodes processing 1 gigabyte every 3 seconds because we aren’t increasing the blocksize cap. We won’t have 10 full nodes, we’ll have millions, because we aren’t increasing the blocksize cap. Clearly our definitions of scaling-“out” are different here, and they’re okay with full-node counts getting traded-off for throughput at the base layer.

Since I believe I laid out a pretty good argument for full-nodes already, our definitions differing won’t matter and I’ll focus on theirs.

Reworded:

They want to increase transaction throughput at the base layer.

They know this will shrink the node count no matter what they do.

Their solution is to split up responsibilities, so it doesn’t shrink “as bad”.

They think that the shrunk total is “good enough” for light-client security.

They justify that shrunken total by comparing it to a hypothetical Bitcoin with 10 full-nodes based on the “mining-node” fallacy.

Their solution is Sharding, which they call scaling-out, and are okay with sacrificing “some” nodes to get, but claim it will result in more nodes in the future than Bitcoin’s 10 nodes.

I think the most important takeaway out of this entire article is the section on Bitcoin’s network of fully validating nodes, inherent decentralization, and how this compares to everything else out there when people try to sell you “scaling solutions”. So let’s compare this to Sharding. This is where it gets fun because even Vitalik hasn’t clearly outlined what the topology is going to look like. I’m going to try my best because Sharding takes the concept of “all nodes are equal and do the same job” and completely abolishes it. Pinpointing where the centralization exists is going to be…fun.

For starters, here’s what the usual explanation of Sharding sounds like, and here’s what the typical article looks like:

This is just a blockchain news website, so it’s expected to have buzzwording and zero technical information. I’m highlighting it because it’s littered with a bunch of words and terms that seem to get glanced over by the uninformed crypto-community. “Scalability” remains undefined, “processing” needs further clarification, and every single mention of the word “node” doesn’t apply to you or your light-node. All of these mentions:

“single node”

“individual node”

“every node”

Can be replaced with:

$16,000 node that Stakes

$16,000 node that Stakes

$16,000 node that Stakes

Everywhere you read or hear about Sharding, the explanation appears to be saying “things will be easier on the nodes”, but the nodes that can afford $16,000 to stake don’t need things to be easier. They can already process much larger blocks. Datacenters don’t need shards, and you won’t be running one of the important nodes on a laptop. There are many kinds of nodes in this system, and it’s still unclear which ones will actually exist when the protocol is finalized. I’ll start by explaining the basic structure, and then defining the main kinds of nodes within this system so we can highlight the ones that matter and the ones that don’t.

Sharding takes a single blockchain, turn it into multiple blockchains called Collations, then puts a twist tie on top and hopes mold doesn’t grow. Joking aside, this diagram of a single collation should help you understand:

Joking aside, here’s a good one that took way too long to make look nice:

Let’s break down what you’re looking at:

Collation [Purple Blocks]: A Collation is just a shard-specific block. They form collation-chains, similar to a block-chain.

[Purple Blocks]: A Collation is just a shard-specific block. They form collation-chains, similar to a block-chain. Executor Nodes [Blue]: Executor nodes validate the transaction data inside each collation. They compute the contracts, and they delegate Collator nodes to specific shards.

[Blue]: Executor nodes validate the transaction data inside each collation. They compute the contracts, and they delegate Collator nodes to specific shards. Collator Nodes [Red]: Collator nodes “gather” the data for that shard, make the collation (block), and then present it to the Executor nodes to “execute”.

[Red]: Collator nodes “gather” the data for that shard, make the collation (block), and then present it to the Executor nodes to “execute”. Light Nodes [Pink]: These are the nodes that you’ll be running. They contribute nothing to the network, they just “watch”. They have the ability to check transactions, but since they aren’t relaying transactions or blocks around, they have no power to withhold relaying if it’s not valid. Again, this is the fundamental difference between Bitcoin nodes and all other blockchains. Every Bitcoin node does the same thing and there’s no way to tell them apart.

Within each shard, the only nodes that matter are the Executor & Collator nodes. Both require 32 ETH to run. Every light-node can “pick a shard they ‘care’ about (if they want to), sync that shard, and the block headers of the main chain. They probably won’t need to unless they are an application or service that is dependent on validating that shard because their contract sits on it.

Above you’ll see multiple Collation-chains, sets of Executor/Collator nodes that do the work on those chains (32 ETH), the “Main chain” (green), and of course your light node at the top if you selected a specific shard to “validate”.

Few things you should note:

You won’t be one of the Executor/Collator nodes unless you have 32 ETH.

Your light-node doesn’t relay blocks. It doesn’t enforce consensus code, and it can’t do anything with invalid data besides scream at the top of its lungs. (A network of validating nodes gets its power from the ability to deny propagation of invalid transactions/blocks. When you aren’t propagating to begin with you can’t withhold them.)

The Main-chain doesn’t have transaction data in it. It only stores the headers of the Collation-chains.

But that’s not all, there’s more to this network. All of your light-nodes that just sync the headers of the main chain can just be grouped together in this big box below. They do nothing, but they’ll be the largest in number. Generally, if you don’t have 32 ETH you’ll be one of those nodes.

My point here was even though “full-validation” is divided into sub-jobs, the group of nodes that do those jobs is still limited. He says these nodes would be processing less than current Ethereum nodes, but again that was never the issue. The issue is that the difficulty to do so grows over time, and the amount of nodes shrinks over time because of it. It’s inherently centralizing. Vitalik even agreed that this number would shrink over time if the gas limits kept going up, and there’s nothing stopping that from happening. Right now miners are being altruistic, but what happens when mining doesn’t even exist? What happens when it’s just staking and the people doing it don’t care about other people’s blocks getting orphaned? Why would they keep the gas limit down? Remember they can manually adjust this, so why would they intentionally keep it low if they’re hyper-connected to each other and fully capable of processing that data? What happens when they start compounding their staking earnings, setting up more nodes, and gain more control of the network?

What happens when people don’t think it’s a shitcoin though? Most are going to fail, but what happens when one of them is convincingly decentralized just enough for the time being to keep people using it?

I said all of this was going to be easy. I don’t feel like it turned out that way, but I tried to keep it as simple as I could. I mention this because I’d like to close this article with a link to the Sharding FAQ where there’s a long list of admitted issues with Sharding, and how they plan to address them, and how each one of those introduces a new complexity with it’s own issue, and another solution to resolve that new issue. It’s convoluted, it took me too long just to decipher the verbiage for the node types, but it’s only fair to provide it.

My issue was never with whether it “worked”, but whether it remains decentralized. It was kind of bad prior to Sharding, but I don’t think it could be any clearer that there’s only one path for this network on a long enough timescale. If you don’t mind having centralized validators, you might as well buy EOS. They skipped the whole pretending part and went straight to being centralized. They don’t even need Sharding because they just handle the blockchain data in a centralized fashion.

Google can process everyones payment’s.

We don’t want Google to process everyone’s payments.

We don’t want the Fortune 500 or the Forbes 400 processing it either.

So what did we learn?

No.

Use a 2nd Layer.

Breaking Down Vitalik’s Response

1:

This is *severely* uninformed. Ethereum already has a block size limit in the form of its gas limit, and this gas limit is at 8 million and has been there for the last six months.

I addressed this above. You will raise the gas limit if Sharding isn’t ready soon enough to come in and stall this issue.

2:

Fast sync datadir growth has flatlined at 10GB per month for the last six months and it’s not going to go much higher, if only because increasing the gaslimit much further would lead to uncle rate centralization issues. So we *already are* experiencing the worst of it and have been for half a year.

If you don’t increase the gas limit the fees will disable Dapps and cause outrage among the community because they have expectations and demands. I went over this above as well. Uncle rate won’t matter when you have no other solution, right now the miners are just being altruistic by listening to you. That’s an issue in and of itself.

3:

Also, focusing on archive node size is highly fallacious because (i) you can have a much lower datadir size by either resyncing even once per year, or just running a Parity node, which prunes for you, and (ii) it includes a bunch of extraneous data (technically, all historical states, plus Patricia tree nodes) that could be recalculated from the blockchain (under 50 GB) anyway, so you’re not even “throwing away history” in any significant information-theoretic sense by doing so. And if you *are* ok with throwing away history, you could run Parity in state-only mode and the disk requirements drop to under 10 GB.

I addressed this “conflict” when I showed the data throughput over time graph. The directory size is analogous to the same exponential growth that’s occurring with the nodes processing requirements. The only counter-response to this is that you won’t raise the gas limit. You will.

4:

The whole point of sharding is that the network can theoretically survive with ZERO of what you call “full nodes”. And if there are five full nodes, those five full nodes don’t have any extra special power to decide consensus; they just verify more stuff and so will find their way to the correct chain more quickly, that’s all. Consensus-forming nodes need only be shard nodes.

I addressed sharding above.

5 (lol):

Finally, you’re using the term “BCash” wrong; it’s an implementation, not a blockchain/cryptocurrency.

Breaking Down Gustav Simonsson’s Response:

1: “(…) the incentive structure of the base layer is completely broken because there is no cap on Ethereum’s blocksize (…)”

This is misleading at best and false at worst. Ethereum has a block size limit due to the block gas limit enforced by the consensus protocol.

they are unlikely to vote for a blocksize increase that would break the network

I addressed the gas limit further in this article. Thanks for motivating me. The network doesn’t break because validators drop off and peers are lost. The network functions with two datacenters. What breaks is decentralization. The connected nodes have no incentive to care about other less-connected nodes validation abilities.

2: “Even if one [blocksize cap] was put in place it would have to be reasonable, and then these Dapps wouldn’t even work because they’re barely working now with no cap.”

Nobody goes to the beach anymore — it’s too crowded.

If a blockchain network is at capacity, with all blocks filled with transaction, then all the tx senders found utility in sending their txs with their particular fees.

This completely misses the point because this is the same argument we make in Bitcoin. It’s very popular, but Bitcoin doesn’t promise low fees and usability to Dapp developers and users. When those Dapps get priced out to basic transactions from mixers (lol) using 90% of the block space to dump their hacked/stolen coins because mixing is worth paying more than using silly Dapps no one really uses, what marketing do you have left? That’s the point. You can argue this down all you want, but at some point you start justifying Ethereum’s existence with only the matching properties of Bitcoin without the fancy bells and whistles, and Bitcoin does Bitcoin better.

The author continues their fallacy by directly contradicting themselves by arguing for apps on Ethereum to move over to Bitcoin. If apps become useless on Ethereum due to increased tx fees, then they would be useless on Bitcoin too if crowded out by other users who pay higher tx fees.

I suggested developpers develop on top of Bitcoin. I didn’t say take the same program ideas you have that may never actually work and build it on Bitcoin. Almost all Dapps are centralized to begin with and aren’t actually “dapps”. They can all be built on Lightning. You won’t have fee issues no matter the blocksize on a payment channel network. Again you could argue Ethereum can do this too, but that doesn’t give its base layer promises any extra reason to exist.

There is no such thing as apps crippling Ethereum due to high load

3: “The [Bitcoin] blocksize doesn’t restrict transaction flow, it regulates the amount of broadcast-to-all data being sent over the network.”

This is false for any sane definition of “transaction flow”. An arbitrary limit on tx/s does restrict transaction flow, as more transactions cannot flow within a given time period… And if we’re including off-chain solutions such as the lightning network as an argument that L1 tx/s limits does not decrease flow, then we should include in such discussions already live solutions on Ethereum too. Or recognize that the cost to setup e.g. payment channels increases as the L1 fees goes up…

We are including them, which is why I said it doesn’t restrict flow. The blocksize is a dam that generates power in the form of fees. The overflow spills into the Lightning Network, which has no upper limitations on transaction throughput outside the volume of Lightning nodes and payment channels, which have no limit themselves.

Also, any transaction where you receive Bitcoin, can be received straight to a newly opened channel. This isn’t a two-step process.

4: “I am saying that this information needs to stop being obscured. I’m also saying that if/when it ever is unobscured, it’ll be too late and nothing can be done about it anyway.”

This information is not obscured. You can simply run a full node and query it

Just because you haven’t found a website doing this

the argument that it “it’ll be too late” when it is unobscured is at best a faulty generalization and at worst the post hoc fallacy.

It’s obscured. It’s not a matter of me “not being able to locate” sites that track this. The sites that did track it stopped tracking it.

You forgot to include the next sentence: “It’s already too late.”

It was a quip, not a fallacy. Take it or leave it.

5: “Keep in mind, none of this information [block propagation times and transaction times] is available for Ethereum”

This is false. Block propagation times can easily be measured by running a few, geographically distributed full nodes, connect them to each other and measure when they see and relay new blocks and transactions.

too lazy to spend a few hours learning how to deploy, use and even add debugging to Ethereum clients, in order to gather such information, they can always check propagation times for nodes connected to https://ethstats.net/

First, that’s opposite of easy. Again this isn’t about me, because I’m clearly able to discern the differences in these networks and gather the information together. I’m the one sharing it because I did so.

Second, I don’t need to set up nodes around the globe to check this, and all the complaints online plus the data to the left of this from the very website you suggested only solidifies the consensus online. When half the nodes that volunteer their data to this website have terrible latency it’s indicative of an issue.

Thirdly, a lazy person wouldn’t go through the effort I am, nor am I false stating this isn’t publicly available. Network data in general is not publicly available, it’s literally not there for the public to see, and some of it once was. You need more than common knowledge to access it.

6: [vague rant about using the blockchain the “right” way and hatin’ on CryptoKitties]

The author presumes there is a “right” way to use a public, permissionless blockchain. The beauty of blockchains such as Bitcoin and Ethereum is that users can use them for whatever they want as long as they can convince a miner to accept their tx.

For example, a lot of people actually _enjoy_ CryptoKitties, to the extent of bidding $140,000 worth of ETH for one cat at a recent auction.

This isn’t about what transactions miners accept. I’m saying that even though they are being accepted now, in the future any Dapps that can only function using low fees won’t be usable unless the limit is raised, or decentralization is sacrificed. You might want to start looking elsewhere for this functionality. If you don’t care about decentralization then this just doesn’t apply to you, that’s totally fine. But this is literally Ethereum’s selling point right now:

Putting money laundering aside, idiots exist. CryptoKitties is a great tool to demonstrate this. I actually like CryptoKitties because of this valuable publicly available litmus test, and I don’t hate cats:

7: “The Bitcoin network has about 115,000 nodes, of which about 12,000 are listening-nodes.”

This appears to contradict several other sources on Bitcoin node counts

If all these sources are wrong, they would probably love to know exactly how nodes are counted by the site the author links.

Moreover, who has audited the scripts calculating these larger node count numbers?

All it does is count non-listening nodes as well as the listening nodes. Counting both is harder to do, so websites don’t do it. Likewise, segregating light-node and validating nodes in Ethereum is harder, so websites don’t do it.

8: “That Ethereum node count? Guarantee you those are mostly Light-Nodes doing absolutely zero validation work (checking headers isn’t validation). Don’t agree with that? Prove me wrong. Show me data.”

How about the author provides some data supporting their speculative claims? “Guarantee you” implies an appeal to authority, and given the above false claims and misunderstandings, the author has in my mind lost enough credibility to be taken seriously on matters of (Ethereum) protocols and networks.

I admit to presuming, but to say my credibility is lost is a bit far-fetched. My concerns are legitimate and shouldn’t be ignored. You can disagree, but you need to make a case for why you disagree, and this and my prior article laid out a pretty clear case: Validating nodes are important and Ethereum neglects them at a protocol level.

9: When your node can’t stay in sync it downgrades to a light client.

False. Even if a node is behind a number of blocks when syncing, it can still answer queries for past blocks and transactions and service other nodes that are syncing. The author would do good to examine the concurrency and state handling of clients such as parity and go-ethereum to understand more how nodes currently implement syncing and will work with new sharding proposals.

It’s not false, you just took it literally. All the comments online about peoples nodes falling out of sync result in the person then deciding to use fast sync, usually after being compelled by someone else telling them “it’s fine”. From a zoomed out perspective, this results in validating nodes offlining &light-nodes onlining, like the diagram I showed above.

10: “How would you even know how many fully validating nodes there are in this set up? You can’t even tell now because the only sites tracking it count the light clients in the total. How would you ever know that the full-nodes centralized to let’s say, 10 datacenters? You’ll never know. You. Will. Never. Know.”

OK, so right now we are able to know, with full certainty, that there are 115000 correctly verifying full Bitcoin nodes, but in this hypothetical future the author imagines we are unable to know how many correctly verifying full nodes there are in the Ethereum network?

Clearly there is some network engineering design magic currently present in Bitcoin that this future Ethereum network could leverage. Given that both Bitcoin and Ethereum clients are open source, I expect this magic to soon be discovered by Ethereum developers and then merged in, enabling us all to know exactly how many full nodes are present at any given time.

The reason you can be sure for Bitcoin is because all nodes validate. Every participating actor in the network validates the chain, it’s the only way you can know the next block is valid without trusting anyone else. There are no light-nodes in Bitcoin.

In Ethereum there are so many ambiguous ways nodes interact with each other that the only way to reasonably detect which nodes are fully validating would be to request random blocks from the past to see if they have that full block, but most Ethereum nodes typically don’t keep the history because Ethereum is state based.

The networks are fundamentally different, which is why it’s easy to poll the network for Bitcoin, and problematic at best for Ethereum.

— — —

Naturally, it requires more to run an Ethereum full node. And it can strain especially older laptops, and definitely requires an SSD. However, it does not require a beefy server by any reasonable measure. In fact, any dedicated machine with a CPU from the last 6 years, 8 GB of RAM and a modern SSD can process an Ethereum full node just fine (or several full nodes as run on my pretty modest server). The bandwidth usage of tx and block relay is something to consider but is generally not a problem on well connected networks.

1: It’s getting more difficult with time.

2: It’s also kind of moot given the $45,000 validating nodes.

3: Bandwidth on non-$45,000 validating node networks is most certainly important because “well-connected” is dangerous for privacy, as I’ve described above.

— — —

Miners are aware of the current block size (gas) limit and actively take part in discussions with other parts of the community around what an ideal block size limit is

Miners have historically acted both to lower and to increase the limit before

None of this matters in a PoS centralized network. It’s very dangerous in a PoW system over the long term though. There’s no incentive to keep “other” nodes connected or in sync “when they can just sync the headers”. They might be acting altruistic now, but there’s no reason to expect this behavior in the future. It’s a dangerous proposition to start trust honesty among those in power as these networks start scaling up.

— — —

Overall, as clients have continuously improved performance since the launch of the network, miners have gradually increased the limit towards the current value of 8M (Ethereum launched with 3.14M). Generally, if syncing issues become significant enough to affect the ETH price, miners become incentivized to lower the limit to regulate the network.

I have no reason to believe the limit will be lowered, as I’ve made clear throughout these articles.

— — —

As others have already discussed the various sync modes supported by Ethereum clients and their varying resource requirements, another thing worth talking about as an emergency remedy — if the Ethereum network does indeed grow so fast that it becomes hard for most full nodes to keep up — are checkpoints.

Someone like StopAndDecrypt probably panics at the very mention of something as unholy and sinful as blockchain checkpoints. How can a blockchain be decentralized if clients implementing the consensus protocol agree on a checkpoint block rather than syncing from the genesis block?!

Checkpoints have their functions, but you’re presuming a bit in regards to what I think about them. Regardless, sync modes don’t matter, they’re fine for you if that’s what you want to do. My concern, again, is the validating node set, and checkpoints only address history data, not data processing requirements after getting synced.

In practice, a reorg in either Bitcoin or Ethereum deeper than a few hours is extremely unlikely

I agree.

— — —

Epilogue

No one actually knows how many full nodes are required for a network to be “secure”.

Until then, we cannot know if 1K, 5K, 10K or some other number is the minimum required to keep a network reasonable secure.

See above.

— — —

That said, we should continue to encourage individuals and projects working on Ethereum apps — or anyone interested in contributing to the network — to run their own full node.

I hope that have upwards of $45,000 once PoS+Sharding comes. If ETH goes up, that’s even worse. DASH requires 1000 coins, it was $1,000,000 to run a masternode at one point.

— — —

For those for who read this far —

I did, and I don’t hate you. People tend to take my writing as hostile. It’s not.

Tags