Google Dives into the Ethereum Blockchain with its Big Data Analytics Platform CPost Follow Sep 4, 2018 · 3 min read

Google BigQuery, Google Cloud’s Petabyte-scale data warehousing solution, has made the Ethereum dataset available to enable the exploration of smart contract analytics, the company announced on a blog.

BigQuery has made it possible to explore all of Ethereum’s historical data. Ethereum’s ETL project on GitHub includes all source code that can be extracted from the blockchain and entered into BigQuery. Google is seeking new contributors and blockchains.

Making Blockchain Data Accessible

The purpose of making the Ethereum blockchain data accessible on Google Cloud is to make all data stored on the blockchain easily accessible. While Ethereum’s software contains APIs for functions that can be accessed randomly, such as checking wallet balances, the API endpoints are not easily accessible for all data stored on the blockchain.

While API endpoints do not enable viewing blockchain data in aggregate, BigQuery’s OLAP features enable such analysis. The blog displayed a chart showing Ether transfers and transaction costs year to date, aggregated by day. Such visualization supports tasks like prioritizing changes in the Ethereum architecture, should an upgrade be needed.

Google Cloud can synchronize the Ethereum blockchain to computers equipped with Parity, an Ethereum client for building applications, the blog noted.

It also extracts data daily from the Ethereum blockchain ledger, such as token transfers, and stores partitioned data for efficient exploration on BigQuery.

In addition, the BigQuery Python library allows clients to query data tables in Kernels, a free in-browser coding platform on the public data science platform Kaggle.

Smart Contract Analytics

Google BigQuery has already enabled analysis of smart contract function calls, transaction times and smart contract function analytics.

BigQuery has demonstrated querying the contract tables and dataset transactions to identify the most used smart contracts based on transactions. The accompanying chart shows the 10 most popular Ethereum ERC-721 contracts by transactions.

The smart contract for the CryptoKitties game is the most popular ERC-21 smart contract. Where the contract source code logs a birth event to the Ethereum blockchain, the table allows users to query instances of this event.

If someone wanted to discover games similar to CryptoKitties, they can measure this by deploying the Jaccard similarity coefficient, a statistic that is used for comparing the diversity and similarity of sample sets using a JavaScript UDF.

Another query measures the 10 most popular tokens by transaction volume.

It is possible to measure a token by time window such as the daily number of token transfers for a particular token and to create a visualization of the data for a specific time period, as shown in the accompanying chart.

More Visualizations Possible

It is also possible to use directed graph data structure to glean insights about the data since it includes a set of transfers among wallet addresses.

In one example, the first 40,000 transactions contained at least two trading partners. The blog gives an example of a graphic made with Gephi, a visualization software, showing nodes color labeled by groups of addresses that transfer often with each other. The Modularity algorithm was used to calculate this graphic.

Much of the smart contract source code is available for free, allowing Google users to discover what functions contracts perform based on the name, even functions that don’t have the source since the common function names carry a common signature.

Google Cloud has given momentum to smart contract analytics through BigQuery.