3 engineers spent months building, monitoring, maintaining their in-house solution

Fluidity initially spent around 1,200 hours building and tweaking 10+ geth instances of varying hardware and software components. Their solution used brute force polling, load-balancing nodes, clustering, health checks, and peer checks for the latest blockhead. In mid-2018, while running 6 nodes split into 2 clusters, they experienced memory leaks in the underlying geth software itself due to extremely high traffic-up to 800 requests per second or 70 million requests per day. They could not even ssh in to fix the nodes since the memory was so full, resorting to restarting the entire server instead. The team was initially waking up every 3–4 hours to check memory consumption, and running a cron job every 4 hours to restart geth and clear memory. After the first two weeks, they were checking every 8–12 hours to ensure the script was still working. The operations team was entirely consumed by geth node work whilst getting poor quality sleep and constant PagerDuty alerts.

Based on the data available at the time, Fluidity was one of the few projects operating with geth at such scale and ended up having to devote developer resources to improving the geth software to fix the problems they found. Devoting time to fixing the underlying node software took time away from developing their core products.

The team improved the solution such that they only spent about one hour per day maintaining in-house nodes, but they also noticed that data was sometimes out of sync or not at the latest head.

Problems often arose, sometimes requiring a full rebuild of the state trie

During network forks or geth software updates, one node in the fleet would often not pick up on the updates. On one particular occasion, an old version of geth was pushed out to some of the nodes, which caused a subset of the node fleet to revert to a pre-fork state. This meant about 200,000,000 state trie records needed to be recreated. To resolve that issue and on other instances, the state trie needed to be rebuilt. The rebuild process took 6–7 hours. Due to the nature of the state trie, there was no visibility into the maximum size, thus it was impossible to compute when the rebuild would be completed accurately. This meant engineering resources were continually diverted during the rebuild to check on the state of the node.

Most tested providers returned latest block as pending

After a year of running in-house, Fluidity began testing external solutions and comparing providers side-by-side. In one test, Fluidity pulled the latest and pending blocks from various infrastructure providers, including Alchemy. It demonstrated the differences between providers in regards to block height and pending transactions. Many providers returned the latest block as the pending block, which would cause unwanted behavior in Fluidity’s applications.

Comparing providers: long lags from transaction submission to being queryable

The most common issue was related to a pending transaction state. There were a lot of geth providers with a large lag between a transaction being submitted to the node, and the transaction details being returned by a function like web3.eth.getTransaction([transaction_hash]). Sometimes, the transaction can’t be found until after it has already been mined. For Fluidity this was unacceptable, as they rely on this functionality to deliver a snappy and responsive UX in frontend applications.

Long-term solution: Alchemy is reliable and consistent

In testing and production, Alchemy was the only infrastructure provider that was able to show the right block height, the correct pending block information, and a large pending transaction pool. These are the most critical pieces of blockchain information Fluidity needs from their node infrastructure.

“While we save a few hours a week, the biggest value add of using Alchemy has been giving our developers time to focus on the big picture instead of running geth nodes.”

Fluidity also values the amazing and fast customer support from Alchemy. Fluidity can get ahold of the Alchemy team at a moment’s notice in critical times. For example, Fluidity recently had a question about the transaction pool. Alchemy’s team is technical and turned around immediately, being the only provider that was able to do so.

Empowered by Alchemy’s expert solution, Fluidity now focuses on the big picture and moving forward their core products. They can study current events in the Ethereum network, view forks as a consumer rather than as a producer, and consider how changes in the ecosystem impact Fluidity.