There was a post recently comparing online nodes for Ethereum and Bitcoin, where I mentioned in comments that numbers related to Ethereum are not representative enough.

I’ve been tracking online Ethereum nodes for some time now, so I have all this data for further analysis. I publish some of the results for the Ethereum Classic network on the Gastracker page, though I have more detailed data, with deeper insights about all Ethereum based networks.

So what is the problem with crawling the Ethereum network? First and the most obvious obstacle is the existence of Ethereum and Ethereum Classic because both share the same protocol, same identifiers, and same initial history, including genesis block. Nodes from both networks sporadically connect to each other, exchanging new transactions and data from a shared history. By connecting to an ETH node, you can’t be sure that all of its peers are ETH nodes as well. Initial Handshake doesn’t provide enough information, so a node keeps connections even with a peer from another network, telling other ETH nodes about that node, and so on.

Network ID 1 is not Ethereum

That’s the main issue with data provided by services like Ethernode. It just shows all nodes, regardless of the actual chain. It, in fact, doesn’t say “ETH nodes” anywhere, but “nodes with network id 1”, which is shared by a few different blockchains, not only ETH and ETC.

The only way to distinguish them is to connect to them and download part of the blockchain history, as it has different blocks for different forks. From my experience, less than ~70% of such nodes are actual Ethereum nodes.

Here is a distribution of nodes per chain with network id = 1, i.e., all of them are in the same bucket on the Ethernodes page.