Mining the wealth of information in the BitCoin blockchain is nothing new, but BitCluster goes a long way to make sense of the information you’ll find there. The tool was released by Mathieu Lavoie and David Decary-Hetu, PH.D. on Friday following their talk at HOPE XI.

I greatly enjoyed sitting in on the talk which began with some BitCoin basics. The cryptocurrency uses user generated “wallets” which are essentially addresses that identify transactions. Each is established using key pairs and there are roughly 146 million of these wallets in existence now

If you’re a thrifty person you might think you can get one wallet and use it for years. That might be true of the sweaty alligator-skin nightmare you’ve had in your back pocket for a decade now. It’s not true when it comes to digital bits — they’re cheap (some would say free). People who don’t generate a new wallet for every transaction weaken their BitCoin anonymity and this weakness is the core of BitCluster’s approach.

Every time you transfer BitCoin (BTC) you send the network the address of the transaction when you acquired the BTCs and sign it with your key to validate the data. If you reuse the same wallet address on subsequent transactions — maybe because you didn’t spend all of the wallet’s coins in one transaction or you overpaid and have the change routed back to your wallet. The uniqueness of that signed address can be tracked across those multiple transactions. This alone won’t dox you, but does allow a clever piece of software to build a database of nodes by associating transactions together.

Mathieu’s description of first attempts at mapping the blockchain were amusing. The demonstration showed a Python script called from the command line which started off analyzing a little more than a block a second but by the fourth or fifth blocks hit the process had slowed to a standstill that would never progress. This reminds me of some of the puzzles from Project Euler.

After a rabbit hole of optimizations the problem has been solved. All you need to recreate the work is a pair of machines (one for Python one for mondoDB) with the fastest processors you can afford, a 500 GB SSD, 32 GB of RAM (but would be 64 better), Python 64-bit, and at least a week of time. The good news is that you don’t have to recreate this. The 200GB database is available for download through a torrent and the code to navigate it is up on GitHub. Like I said, this type of blockchain sleuthing isn’t new but a powerful open source tool like this is.

Both Ransomware and illicit markets can be observed using this technique. Successful, yet not-so-cautious ransomers sometimes use the same BitCoin address for all payments. For example, research into a 2014 data sample turned up a ransomware instance that pulled in $611k (averaging $10k per day but actually pulling in most of the money during one three-week period). If you’re paying attention you know using the same wallet address is a bad move and this ransomware was eventually shut down.

Illicit markets like Silk Road are another application for BitCluster. Prior research methods relied on mining comments left by customers to estimate revenue. Imagine if you had to guess at how well Amazon was doing reading customer reviews and hoping they mentioned the price? The ability to observe BTC payment nodes is a much more powerful method.

A good illicit market won’t use just one wallet address. But to protect customers they use escrow address and these do get reused making cluster analysis possible. Silk Road was doing about $800k per month in revenue at its height. The bulk of purchases were for less than $500 with only a tiny percentage above $1000. But those large purchases were likely to be drug purchases of a kilo or more. That small sliver of total transactions actually added up to about a third of the total revenue.

It’s fascinating to peer into transactions in this manner. And the good news is that there’s plenty of interesting stuff just waiting to be discovered. After all, the blockchain is a historical record so the data isn’t going anywhere. BitCluster is intriguing and worth playing with. Currently you can search for a BTC address and see total BTC in and out, then sift through income and expense sorted by date, amount, etc. But the tool can be truly great with more development. On the top of the wishlist are automated database updates, labeling of nodes (so you can search “Silk Road” instead of a numerical address), visual graphs of flows, and a hosted version of the query tool (but computing power becomes prohibitive.)