There are many cases when you want to export Ethereum data to the csv format and run some queries, perform analysis, draw diagrams/charts/graphs etc. Some of the questions you might be interested in:

When did the first in history ERC20 transfer happen? How many ERC20 compatible contracts are there on the blockchain? What are the most popular tokens? What fraction of ERC20 transfers are sent to new addresses, i.e. addresses that had 0 balance before? In what order are transactions included in a block in relation to their gas price? What fraction of all transactions are calls to contracts? What was the highest transaction throughput? What is the total Ether volume? Token balance for any address on any date? What is the total gas used in all transactions? …

In this article I will walk you through the process of exporting Ethereum blocks, transactions and ERC20 transfers using Ethereum ETL, running SQL queries with AWS Athena and visualizing the data in AWS QuickSight.

Read until the end and you will know the answers to the questions above.

Exporting Data to CSV

I exported the first 5 million blocks and the whole process took me:

2 EC2-instances with 200 GB SSD disks. Both memory and CPU are important for this task so I chose t2.medium . You can check other types here https://aws.amazon.com/ec2/instance-types/. As Yann-Aël Le Borgne noted in the comments, using m5.4xlarge can reduce the total export time to less than 24 hours.

. You can check other types here https://aws.amazon.com/ec2/instance-types/. As Yann-Aël Le Borgne noted in the comments, using can reduce the total export time to less than 24 hours. 3 days; $14 (depends on the region). Using Auto Scaling Group can reduce the time to a few hours, read How to Export the Entire Ethereum Blockchain to CSV in 2 hours for $10

Below is the step-by-step guide:

Create an AWS account if you don’t have one and sign in to the EC2 console https://console.aws.amazon.com/ec2/v2/home

Launch Ubuntu 16.04 t2.medium instance

instance ssh and install geth following these instructions https://github.com/ethereum/go-ethereum/wiki/Installation-Instructions-for-Ubuntu

start geth

> nohup geth --cache=1024 &

Wait until geth downloads the blocks that you need. You can check it by running:

> geth attach

> eth.syncing

{

currentBlock: 5583296,

highestBlock: 5583377,

knownStates: 65750401,

pulledStates: 65729512,

startingBlock: 5268399

}

Normally geth will download the latest state after the blocks are loaded. You don’t need to wait until the full sync though. As soon as geth downloads the blocks you can start exporting the data.

Clone Ethereum ETL

Install dependencies

> cd ethereum-etl

> sudo apt-get install python3-pip

> pip3 install -r requirements.txt

Run Ethereum ETL (you may want to comment out the parts that you don’t need in export_all.sh before running it):

> nohup bash export_all.sh -s 0 -e 4999999 -b 100000 -p file://$HOME/.ethereum/geth.ipc -o output &

Wait until it finishes by inspecting nohup.out. The output will be partitioned in Hive style, every partition containing 100k blocks. Totally 146 files (no ERC20 transfers in first 400k blocks).

Blocks — 4.8 GB:

Transactions — 51.5 GB:

ERC20 transfers — 7.9 GB:

Sign in to S3 console https://console.aws.amazon.com/s3/home. Create a new bucket and grant appropriate permissions to the EC2 instances.

Install aws cli https://aws.amazon.com/cli. Configure it by running

> aws configure

AWS Access Key ID:

...

You can generate a new access and secret keys in IAM console https://console.aws.amazon.com/iam/home

Upload the files to S3:

> cd output

> aws s3 sync . s3://<your_bucket>/ethereumetl/export

Here is the price breakdown for Singapore:

EC2 instances: $0.0584 per hour * ~144 hours = $8.40 https://aws.amazon.com/ec2/pricing/on-demand/. You can save some money if you use Spot instances https://aws.amazon.com/ec2/spot/pricing/.

https://aws.amazon.com/ec2/pricing/on-demand/. You can save some money if you use Spot instances https://aws.amazon.com/ec2/spot/pricing/. EBS storage: $0.12 per GB-month * 400 GB * ~1/10 month = $4.80 https://aws.amazon.com/ebs/pricing/.

https://aws.amazon.com/ebs/pricing/. S3 storage: $0.025 per GB-month * 70 GB * ~1/4 month (depends on how long you will keep the files) = $0.44 https://aws.amazon.com/s3/pricing/. You can move the files to Glacier when you don’t use them to save money, which will cost you 5 times less. Alternatively, use Infrequent Access or One Zone-Infrequent Access which will cost you 20–40% less.

You don’t pay data transfer costs into EC2 when synchronizing with the Ethereum network. Data transfers between EC2 and S3 are also free within the same region.

Before you proceed you might want to convert the CSV files into a columnar format: Converting Ethereum ETL files to Parquet. This step is not necessary but it will significantly reduce the running time and the cost of your SQLs.

Running SQLs in AWS Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. You can also connect to it with the JDBC driver.

Sign in to AWS Athena https://console.aws.amazon.com/athena/home

Create a database

CREATE DATABASE ethereumetl;

Create tables for blocks, transactions and ERC20 transfers (check up-to-date schema code in this repository https://github.com/medvedev1088/ethereum-etl#querying-in-amazon-athena):