An efficient and extensible way to scan the Blockchain

Please Sign up or sign in to vote.

WARNING !!! The Scanner is now Obsolete in NBitcoin making the part "A simple and extensible design for Blockchain Scanning" obsolete

Introduction

The previous article I wrote about bitcoin (Introduction article and one about stealth address and two factor), I did not invent anything special, I just wanted to explain in simpler term how things work.

On the other hand, this article propose a simple and scalable design to scan a blockchain, that work for full Bitcoin node and (soon) SPV.

I entirely made up this design, and it is experimental for now, and expect some bugs lurking around despite the 100+ unit tests I wrote. I expect you will use it and give me feedback about it.

Being experimental, the code of this article might have changed since I wrote it. But the general principle stays the same.

For people that knows how bitcoin works, you can skip the “Bitcoin Basics” part.

Bitcoin basics

What is a transaction

For people not familiar with Bitcoin, I will explain how it works internally so you can understand the rest of this article, for other, skip it.

Bitcoin, as you have heard, wants to be a decentralized currency.

Decentralized means that anybody can verify, confirm, and spread a transaction on the network, without any permission and authentication.

But what is a transaction ?

A transaction is an atomic operation that move bitcoin, from sender(s ) to receiver(s ). (One transaction can involve multiple parties)

The senders fill the inputs part of the transaction (by convention at the top)

The receivers fill the outputs part of the transaction (by convention at the bottom)

The input contains a reference to a previous output called an OutPoint, and a proof of ownership from the sender called a scriptSig (typically, but not limited to, a ECDSA signature, as I explain in this article)

The output contains a scriptPubKey that specify the way the receiver can prove its ownership to spend the funds. The output also contains an amount of BTC to spend.

Here is an example of Yellow paying Green (1 BTC), Blue (10 BTC), Orange (5 BTC).

Now imagine that this transaction is confirmed on the network, and Orange wants to pay 4 BTC to Green. In Bitcoin you can’t consume partially an output. You consume it all at once.

So, the previous Orange output becomes referenced as an input.

And outputs contains the 4BTC to Green, and 1 BTC back to orange.

Now, if this transaction is confirmed on the network, and Green wants to pay 9 BTC to Blue.

In the same way, green, will reference its two outputs as input in the new transaction.

And send back the surplus to himself.



How transactions are verified

Being decentralized means that anybody can verify the validity of a transaction by himself.

What do you need to verify ?

Each input references an unspent output

references an output Each input correctly proves ownership of the referenced output . (as I explain in this article)

correctly proves ownership of the referenced . (as I explain in this article) The sum of spent outputs (output referenced by the outpoint of an input) is equals or more than the sum of outputs. The difference between these two amounts is called a fee.

So, if you want to verify the validity of a transaction, then you need all the confirmed transactions from the beginning of time (currently approximately 20 GO), and index all unspent output, to efficiently check that the referenced output are not already spent.

How transactions are confirmed

A transaction is confirmed by miners. A miner is an agent that is economically incentivized to confirm transactions that are spreading into the network.

What motivate the miner ?

The miner receives all the fees of all transaction he confirms

The miner receive a newly created amount of BTC, called a reward. The amount is currently 25 BTC, but will decrease in the future until a maximum of 21 million BTC is emitted. (See this page)

A miner gathers transactions he wants to confirm into a block, and the hash of such block need to start with a certain number of zero, for the transaction to be “confirmed” by the network.

The number of zero is called “difficulty”, the difficulty is adjusted every 2 weeks depending on how fast blocks where mined, so you can expect one block discovered in the world every 10 minutes.

How to instantly confirm a transaction

Some people argues that bitcoin takes too much time to confirm a transaction compared to say, visa.

This is simply not true. You don’t need to wait so much time to be sure the sender of the money does not try to spend multiple time his outputs.

As explained in previous article, you can create an output that can be spent only if 2 different keys sign it correctly.

If the buyer and the merchant have a shared trusted party that will co-sign an input with the buyer, then the merchant can just verify that the transaction is signed by the trusted party and trust such trusted party will not sign another transaction to spend the same output.

What is the block chain

The block chain is the ordered list of all blocks ever discovered in the bitcoin network since its creation. So it is the set of all transactions ever confirmed.

How to get your balance

You need to scan the whole blockchain for unspent Outputs for which you know how to prove ownership. How to scan simply ? this is what my article talks about next.

(Obsolete) A simple and extensible design for Blockchain Scanning

Scanner

As said earlier, a scanner is an object that process transactions from the blockchain for unspent output you are interested to track.

Here are some idea of scanner, some of which I developed, some of which I will :

Colored Coins Scanner

PubKeyHashScanner (classic bitcoin address)

PubKeyScanner (same thing except the pubkey is used in the scriptPubKey instead of its hash)

StealthPaymentScanner (Get your funds with the scan key, Dark wallet compliant, as explained in this article)

A scanner provide two things : If it can, a pattern of data you want to find in a transaction with GetScannedPushData , thanks to that, you will considerably lower scan time, because you won’t have to download most of the Block you are not interested (thanks to BIP 37)

ScanCoins which take a block and its height, and output a set of ScannedCoin (Which is nothing but a transaction stripped from irrelevant data for the scanner)

A scanner is stateless, so we will talk about how to save a scanner’s progression.

ScanState

If you don’t want to scan the blockchain everytime you restart your program, you need to store two kind of data, that I encapsulated into the ScanState class :

The Chain you scanned so far.

you scanned so far. The Account that give you the set on unspent outputs and your balance so far.

You program have a full mainChain that will reflect the current BlockChain.

However, each ScanState have their own chain that can be partial (If you create a Key today, you don’t want to scan the whole blockchain from 2008 to now)

Everytimes you want to scan the mainChain you call the Process method, and pass to it a IBlockProvider .

The method will :

Cancel some AccountEntry in Account if Chain is a fork.

in Account if is a fork. Traverse the chain from the latest fork between mainChain and ScanState.Chain, updating both Account and Chain while doing it.

However Account and Chain might become very big, so in order to be efficient, one need to save Account and Chain incrementally.

The approach use to do so is to save Account and Chain as a stream of change. (Some people call such design Event Sourcing)

Every time the state of Account or Chain is changed, it is saved as a change, that will be replayed during deserialization.

The class responsible for saving such incremental changes is called ObjectStream .

And it is used by both, Chain and Account , (ObjectStream<ChainChange> and ObjectStream<AccountEntry> respectively)

The only current implementation is StreamObjectStream<T> which save your changes into a Stream .(File stream, or in memory stream)

Let’s go back to ScanState.Process

A Chain is a set of BlockHeader , so it does not include the transactions of a Block , this mean that the scanner need a way to get a Block from the BlockHeader. This is the goal of IBlockProvider .

You can searchedData to IBlockProvider.GetBlock , so, when I will implement SPV, the block provider will only download block that contains pushes with the searchedData.

Currently, it exists only one implementation that require the full blockchain on your computer : IndexedBlockStore .

IndexedBlockStore requires two things : a key value store called NoSqlRepository (only existing implementation is SQLiteNoSqlRepository )

And a BlockStore which is a raw, enumerable set of downloaded block. The format it uses is the same as bitcoinq. (blkXXX.dat)

In summary, here is how to Index your own IndexedBlockStore .

var store = new BlockStore( " bitcoin/datadir" , Network.Main); var index = new SQLiteNoSqlRepository( " myindex.db" , createNew: true ); var indexedStore = new IndexedBlockStore(index, store); indexedStore.ReIndex();

ScanState.Process need a second thing : mainChain which is the current full chain of BlockHeader .

You can get this one very easily from the Bitcoin network by using NodeServer .

NodeServer is misnamed, because it is a server only if you call Listen() , else it is a simple client Node on the Bitcoin Network.

NodeServer client = new NodeServer(Network.Main); var inFile = new StreamObjectStream<ChainChange>(File.Open( " MainChain.dat" ,FileMode.OpenOrCreate)); var mainChain = client.BuildChain(infile);

In this example, NodeServer.BuildChain will

Build the chain from the given ObjectStream<ChainChange>, by replaying the changes.

Download all the BlockHeader from the last fork of the chain and the downloaded chain. (Takes 1 minutes if you download the current 300 000 block headers)

Manual scanning

All the previous scan are good if you want to track interesting transfers on the BlockChain.

But imagine you want to make a simple analytic application ?

For example, is some case, I needed to make a query to retrieve all transactions between december 2012, and january 2013, with an amount around 1100 BTC.

First, let's get the chain of all transactions (1 minutes), you only need to do one time, because the chain is saved to MainChain.dat

var node = Node.ConnectToLocal(Network.Main); node.VersionHandshake(); var chain = node.GetChain();

Then, let's build an index of all blocks, so we can query blocks by blockId after.

BlockStore store = new BlockStore("E:\\Bitcoin\\blocks", Network.Main); //the folder is the blocks folder of all blocks saved by Bitcoin-QT, the original bitcoin client application, this "store" can only browse blocks sequencially IndexedBlockStore index = new IndexedBlockStore(new SQLiteNoSqlRepository("PlayIndex"), store); //Save a SqlLite index in a file called PlayIndex index.ReIndex(); //Index, to do one time only to fill the index

Once you have both, the chain and the index, you can run the analysis.

var headers = chain.ToEnumerable( false ) .SkipWhile(c = > c.Header.BlockTime < from ) .TakeWhile(c = > c.Header.BlockTime < to) .ToArray(); var target = Money.Parse( " 1100" ); var margin = Money.Parse( " 1" ); var en = new CultureInfo( " en-US" ); FileStream fs = File.Open( " logs" , FileMode.Create); var writer = new StreamWriter(fs); writer.WriteLine( " time,height,txid,value" ); var lines = from header in headers let block = index.Get(header.HashBlock) from tx in block.Transactions from txout in tx.Outputs where target - margin < txout.Value && txout.Value < target + margin select new { Block = block, Height = header.Height, Transaction = tx, TxOut = txout }; foreach ( var line in lines) { writer.WriteLine( line.Block.Header.BlockTime.ToString(en) + " ," + line.Height + " ," + line.Transaction.GetHash() + " ," + line.TxOut.Value.ToString()); } writer.Flush();

What's next ?

Next of NBitcoin will be the creation of an IBlockProvider for SPV scenario.

It will use NodeServer , an IndexedBlockStore cache, and BloomFilter to download block that are needed by the scanner.

Such BlockProvider will be thread safe, so multiple Scanner can download blocks at the same time, while sharing the same cache.

Also, NBitcoin lack a coherent model of Wallet that reuse this concept of ScanState, this will come soon though.

Conclusion

There is not tons of code in this article, but I expect you to discover and use this lib thanks to the 150+ unit test suite I wrote. (Once again, I also ported tests from bitcoin core)

This lib took me a tremendous amount of time to code, so I’ll be more than happy you give me your feedback or improvement, or tip at 15sYbVpRh6dyWycZMwPdxJWD4xbfxReeHe, so I can keep being motivated, eat pizza and drink coffee !