I spend a lot of time looking at historical Ethereum transactional data. I do this by scanning the chain using QuickBlocks.

If you’ve ever done this, you will be familiar with a certain set of transactions that take a very long time to process. These transactions happened between blocks 2,286,910 and 2,717,576. They are a pain in my a$$. See here.

In a surprisingly effective attack, some evil genius took advantage of an underpriced opcode to create millions of dead Ethereum accounts. This had the effect of significantly bloating the state database, but more importantly for our purposes it created tons of transaction traces.

Until recently, if we were scanning the Ethereum database (especially if we were scanning this section and looking at traces — which QuickBlocks does all the time), we had to wait many hours (perhaps even days) while Parity delivered the traces. We could have cached these traces, but our goal has always been to create a minimal impact on the target machine (this helps us stay decentralized), and we never write data without thinking about it careful. With the solution we present below, we can now effectively choose whether to scan, skip or cache these transactions. This post discusses how we did that and how we now routinely scan very quickly through this difficult portion of the data.

Short History

The morning of DevCon 2, there was a hack against the Geth client. The Ethereum developers responded quickly, and fixed that hack, but a few days later another attack occurred. This second attack went on for more than a month and is described here. In response, the Ethereum devs conducted two hard forks: Tangerine Dream (EIP150) and Spurious Dragon (EIP 161). As one would expect, the hard forks did not change existing data. Instead, they changed the way the client code works so that, at the end of a transaction, if an account has been “touched” during that transaction, and the account would otherwise end up dead, the account is now removed.

The attacker created millions of useless accounts across thousands of transactions prior to the forks. Because the hard forks do not actually remove the dead accounts directly, an off-chain process was initiated to ‘touch’ these accounts so they would be removed. This worked, but it created a second huge amount of ‘cleanup’ transactions each with its own large set of traces. Needless to say — this entire section of the blockchain — from the start of the hack to the end of the cleanup — is ugly and very bloated (which translates in our world to “slow” which we hate!).

Let’s Go To The Data

As I said, these troublesome transactions are especially annoying if you want to view the traces (which QuickBlocks does all the time). We struggled with this problem long enough. We needed a solution.

The first thing we did was to gather some data. We scanned the first 3,500,000 transactions. At each block, we looked at every transaction and counted the number of traces generated during that transaction. (This took a very long time).

From this data, we created a heat map showing how frequently certain numbers of traces were generated per transaction for each 100,000 block section of the chain. The columns represent rising block numbers starting from zero and going up to 3,500,000. (There are 35 columns in the below chart.) The rows range from zero traces to 150 or more traces in a given transaction, and each cell in the table represents the count of transactions with that many traces.

Light blue represents a small number of transactions in that portion of the chain with the given number of traces. Darker blue represents an increasing number of transactions in that section with the corresponding number of traces. At the top of the chart the dark blue rows represent transactions with anywhere between zero and ten traces. You can see that most transactions, across the entire history of the chain, have few traces. While early blocks seem to have more transactions with two traces, transactions in more recent blocks appear to have a growing number of traces. We think this indicates growing use of smart contracts compared to the eary chain.

Heat Map of Ethereum Trace Counts per 100,000 Blocks

We suppose the early blocks with two (and the very early blocks with around 35) traces are experimentation on the early chain when the price of ether was miniscule.

Do you notice anything else of interest? Do you see that dark blue box at the bottom-middle of the chart?

The Fall 2016 Ddos attack stands out like a sore thumb (i.e. there are 1,000s of transactions with many 1,000s of traces in that region). Here was our solution. We can very clearly and easily box in the troublesome transactions. Now that we’ve identified them, we needed a way to skip over them.

Writing Code that Skips Ugly Transactions

It turns out, the solution to our problem was relatively simple. What we did is identify any transaction between 2,286,910 and 2,717,576 that had more than 1,500 traces. First, we needed a way to figure out how many traces a transaction had without querying the traces (querying the traces was the problem after all).

Luckily, Parity’s RPC provides a function to query a single trace. We use that to decide if the transaction has or does not have a trace at a given location: