BigchainDB: Where it Came From, Where We’re At, Where We’re Headed

BigchainDB Pre-History: ascribe

In summer 2013, we started working on a project that became ascribe: blockchain-based intellectual property (IP) attribution. When we described it to others, the response was usually “huh?”. Blockchains were not a widespread idea. In fact, not even Bitcoin was! Bitcoin hit a peak a half a year later when it broke $1,000 but it took another year for blockchain to enter the world’s consciousness.

We addressed an elephant-in-the-room problem: how do you collect digital art? Framed differently, how can creators of digital art get compensated? In fact, how could any creator of IP, musician, storyteller, videographer, designer, etc. get compensated? Blockchain technology could solve this via a public store of attribution and provenance.

By summer 2014, we’d made enough progress where I could step back from my full-time role at my previous startup, Solido. My ascribe co-founders Bruce and Masha jumped into ascribe full-time too. We raised money, hired a few early employees, and kept working on product until we were satisfied enough to release it in early 2015, built on the Bitcoin blockchain.

In early 2014, the Monegraph folks independently had a similar idea, and announced it as the results of a hackathon. Kudos to Kevin for identifying this problem too! Of course, it was frustrating to see someone snag the PR buzz of announcing first. Operator had this happen with Magic; and doubtless countless other startups have experienced this. So be it. Our lesson was learned: announce / release earlier, because you can always mature the product publicly.

“ If you are not embarrassed by the first version of your product, you’ve launched too late” -Reid Hoffman.

Hints of Scale Challenges

In late 2014, one potential ascribe customer with 20 million users had more than 100,000 works a day going through their system. That’s roughly the same number of transactions that the entire Bitcoin network had per day; and Bitcoin was already starting to show strain if pushed much beyond. Also, it was getting more expensive to do transactions: a few cents back then, and these days it’s more than ten cents, so 100,000 transactions means over $10,000 in transaction fees, per day. Moreover, to register the marketplace’s existing 100 million works, that would cost $10 million. Oops. We saw that this would be an issue for any marketplace at scale, who might consider using the ascribe API. In the fall of 2014, I gave a talk on the issue, and the contrast to “big data” database performance.

Being a startup, we had to focus. We chose IP attribution. We hoped that someone else would solve scalability; we even actively encouraged friends to work on it.

Specifying the Problem to Solve (and to Not Solve)

By late summer 2015, more and more people were raising flags on the scalability of the Bitcoin blockchain. There were many smart people working on it. People were starting with an existing blockchain, and trying to scale it up (“big-datafying it”).

At the same time, Bitcoin’s scalability limits were biting us more severely. The product was basically in shape to serve larger-scale customers and corpora, with the glaring exception of the blockchain scalability. We found ourselves needing to turn down opportunities knowing that the Bitcoin blockchain wouldn’t be able to handle the throughput we needed to serve larger enterprises.

If you aim to scale up an existing blockchain, it’s a huge slog. Technologists have spent decades figuring out how to scale up distributed databases; there is a tremendous amount of technology that would need to get injected to take advantage of the learnings.

I’d spent almost 20 years designing machine learning algorithms at scale. One of my biggest lessons from that time was: if you want scale, you need to design for it from the start. You can’t just “scale up” some toy algorithm up, by a thousand or a million times. In my work on symbolic regression, I first tried the “scale up” approach, and failed, but with a radically simpler approach, I got orders of magnitude improvement. I also did this for topology optimization, and for memory verification (with my Solido colleagues). Google researchers found similar results.

Back to blockchains and storage. What if you started with a storage mechanism that was already designed for scale? This is exactly the tech of distributed databases. We could leverage:

Distributed storage, at scale: Many existing databases were already distributed, with computing resources spread across multiple physical machines. This is how Google, Facebook, Netflix and others achieve planetary scale.

Consensus, at scale: If you have a distributed database, and different physical machines disagree on values, then that’s a problem. So how do you keep the data in sync? That’s the role of consensus algorithms, which existing distributed databases employ. Ordering transactions is the heart of consensus. There are “fault tolerant” and “Byzantine fault tolerant” varieties of consensus protocols like Paxos and PBFT, some of which go back to the 80s, and actually build on work from the 50s and 60s (think ARPAnet).

With this starting point, how do we “blockchain-ify” it? We drew on our experience in shipping blockchain products to define three specific characteristics:

Decentralized: No single entity owns or controls the database.

Immutable: More tamper resistant than usual. This is shades of gray, since existing logging databases and snapshotting technologies provide degrees of tamper-resistance already. An example way to add tamper resistance is for nodes to cryptographically sign their votes.

Assets: One can register & transfer assets, where the owner of private key is the owner of asset.

Towards BigchainDB’s First Release

In late summer 2015, with the definitions above as a starting point, we had to choose a distributed DB to start from. We benchmarked Cassandra, MongoDB, Elasticsearch, and more. Unsurprisingly, every single one had far better scalability than blockchains we’d seen. (And I acknowledge they solve slightly different problems!) We settled on RethinkDB, an AGPL-licensed document store with JSON-style hierarchical keys. The major reason for RethinkDB was its excellent changefeed mechanism: every time any node made any changes to the data, all the other nodes are informed. This would turn out to be incredibly useful for building the base functionality, and improving fault tolerance.

We did many iterations against RethinkDB to maximize throughput, at the same time as iterating on our algorithm. Our goal was one million writes per second, subject to the “blockchain-ify” part not getting in the way of raw performance. We found flaws in our algorithm itself, but were able to re-jigger the algorithm, typically by simplifying even more! We even found some bugs in RethinkDB that hampered performance, and working with the excellent folks at RethinkDB, fixed them.

One breakthrough was realizing that ordering/write was the biggest bottleneck, which could be delegated to the underlying distributed database (RethinkDB). Like many distributed DBs, RethinkDB has the property that as you add server nodes, the throughput goes up linearly (to a max of 64 nodes). If you think about this, it’s an amazing result but also totally non-intuitive! How does RethinkDB increase throughput by adding nodes? The key is sharding: each node stores a subset of the data. It’s only up to that node to store that data. Each transaction is shunted to the node with the corresponding responsibility to store that transaction. So, the workload of incoming transactions and writing gets distributed among many nodes.

Of course you don’t want data to get lost if that node goes down; so you have backups, and backups of backups. No backups is a “replication factor” of 1, one backup is replication factor of 2, and so on. Yes, you can have “full replication” where every node stores all the data, but of course this hurts scalability. Replication factor can be tuned according to fault tolerance needs.

So, throughput increases proportionally with the number of nodes, N. But, there’s still communication overhead among the nodes. And, all nodes talk to all nodes, which communication overhead is a square of the number of nodes. A reasonable question is: how does this not kill scalability? The answer is that the square effect has a big impact when there is a huge number of nodes; and this simply doesn’t happen with RethinkDB, which caps out at 64 nodes.

Once we truly got that “ordering is the key” for consensus, we realized that we could postpone voting until after ordering/write. This enabled further simplifications in architecture.

Blocks weren’t necessary, but were a great optimization. When we started the BigchainDB work, we chained at the transaction level, i.e. a transaction object would contain a hash of the previous transaction’s object. After all, who needs blocks if the database is good at ordering at the transaction level? This is perfectly reasonable. And simple! However, chaining at the transaction level means digital signatures were needed for each transaction, which gets expensive. We could optimize by grouping together ordered sets of transactions into blocks, then chaining at the block level. We were willing to accept the complexity of blocks, for speed. But it’s fun to realize: you actually don’t need a chain of blocks to get “blockchain” characteristics; the blocks are just an optimization!

One might ask: given that each node may have to vote on each block, why voting doesn’t become a bottleneck? The answer: each node runs dozens of cores / processes / threads to validate transactions in parallel.

Release v0.1

From the seed of the idea in fall 2014, to intense efforts starting late summer 2015, we finally announced BigchainDB on February 10, 2016, at a blockchain conference in SF. It was kinda fun: everyone thought Bruce was going to talk about ascribe and IP. Instead, BigchainDB! We released the whitepaper and open-sourced the v0.1 code then too. We opened channels in Twitter, Google groups, and gitter.

We’d learned from the Reid Hoffmans of the world: release early, then iterate quickly to improve. So as a version “0.1” might imply, the software we released that day was alpha. It did the basics: creating assets, transferring assets, and a basic data payload. It helped that we sat on top of the large, relatively mature RethinkDB codebase. Our prototype asset code wasn’t yet in the open-source version. Documentation was ok but not great. Permissioning was pretty rough: only at the transaction level and as mentioned transactions were pretty basic. It was hard to deploy in a cluster. We had identified many fault / attack vectors (e.g. a node deletes a bunch of data), but had not built solutions yet. So, people could download the code and kick the tires, but it wasn’t production ready.

The whitepaper included performance numbers on RethinkDB, bottlenecked by writes. Those numbers showed the potential performance if the rest of the algorithm hurt performance. We’d designed the algorithm with the simple mantra: get out of the way of the raw database performance. We saw that nothing preventing everything else from being optimized by parallel processing etc.

We did not release benchmark numbers for end-to-end transactions yet. Why? We had to walk before we ran. We presented the numbers we had so far, knowing that with appropriate optimizations we could maintain high performance for full transactions end-to-end, and planned to release other numbers as we got them.

Releases v0.2 and v0.3

We continued work to make it easier to deploy clusters and to benchmark. Doing a good job on cluster deployment took a ton of effort, in fact more than we expected, and because of this we were slower than hoped in releasing more benchmarks.

On April 26, 2016, we released v0.2, which included better cluster deployment code, better documentation, bug fixes, and more.

In parallel, we made transactions richer to support the Interledger spec. This included multiple inputs and outputs, threshold conditions, multisig, and more; built into a new cryptoconditions module. We released this as v0.3 right after the v0.2 release, on May 3, 2016. From richer transactions also flowed rich permissioning at the fine-grained transaction level.

BigchainDB: Where We’re At Today

With stable cluster deployment and richer transactions, we’ve been able to spend more time on end-to-end benchmarks. We’ve doing some optimization and are getting close to releasing benchmark results, likely the second half of May. With time the numbers will close in on the raw potential demonstrated by already-released benchmarks.

The software is still alpha. It has richer asset infrastructure and therefore good permissioning. However, transactions don’t yet have proper escrow support. (BTW we will never aim for Turing complete transactions, in our view that’s a different piece of the puzzle, left to the good folks at Ethereum, Tendermint, etc.) You have to jump through hoops to do querying. But we can now be proud of our documentation, such as our new end-to-end examples. It’s more straightforward to deploy BigchainDB in a cluster. We are still only accounting for some of the identified faults, so there’s a lot of work there. In short: more functionality, easier to use, but not yet production ready. We’re fine with this, since most blockchain projects are at the proof-of-concept (POCs) stage; people can run their POCs on prototype BigchainDB, and BigchainDB will be maturing into production quality as the POCs turn into production projects.

A Mistake: Miscalibrating Expectations

We made a mistake, and in retrospect we’re kicking ourselves for letting it happen: we haven’t done a great job communicating where we’re at and where we’re headed. While we have communicated much of it on github (roadmap, releases, version numbers <1, etc) and in conversations, we haven’t pulled it to higher levels for easier consumption, such as the bigchaindb.com landing page. And even inside github, it was somewhat cryptic. It didn’t give expected performance for various deployment scenarios, or what fault vectors we do and don’t currently address.

This is something that I’ve come to realize from recent dialogues, with many smart folks. For example, they download the code and kick the tires, and are surprised to get weirdly low performance numbers. It’s our responsibility to make high performance easy!

In short, we didn’t properly calibrate expectations, especially at higher levels. To those of you who had expectations that differed from reality: we’re sorry. We aim to do better.

BigchainDB: Where We’re Headed

First, we’re taking near-term actions to address the communication issues. These include:

Better communication on what we’re working on now and in the near term, for performance, security, and so on. Github roadmap that is easier to consume, yet maintainable w. tickets. [Update: done, see here]

Good documentation on performance for various deployment scenarios, and how anyone can easily repeat our benchmarks. [Update: done, See here and here.]

Good documentation on what fault vectors we do and don’t currently address; and other more detailed FAQ-ish info on fault tolerance.

Better expectation-setting on the bigchaindb.com landing page and other higher-level places

Technical updates that are easy to consume, such as this blog post! We will release more blog posts with more details.

Besides that, we’ll continue working on taking BigchainDB to production, and beyond. This includes improvements in performance, security, and so on according to the release targets. We’re also working on a public version of BigchainDB; more on that in coming weeks.

If you’ve read this far: thank you for your interest in BigchainDB! We’re pretty excited about this technology, and how it connects the promise of blockchain technology with the scale of modern distributed database technology. Fun times ahead!