Musings on the oracle problem facing the Ethereum blockchain.

Pixabay

The two cardinal problems facing smart contract-enabled blockchains today, in order of importance, are 1) scaling and 2) oracles. The former problem is well-known and highly-advertised by almost everyone in the blockchain ecosystem, but the latter is seldom discussed. Scaling (i.e. increasing transaction throughput) makes blockchains usable while oracles make blockchains useful.

What is an Oracle?

One of the key requirements for blockchains is their deterministic verifiability. In other words, a full node must be able to verify the state of the chain — from the genesis block all the way to the current head of the chain — using only on-chain data, at any time (both present and future). This constraint creates some interesting problems that need to be designed around, namely that 1) clever mechanisms are often needed to store off-chain data or perform off-chain computations in a provably correct manner and 2) data external to the blockchain — such as the weather, or the winner to the World Cup — cannot be deterministically verified through cryptographic proofs and must instead be made available for on-chain consumption through other means.

Not this one. Source: Wikipedia

An oracle is any such system that exposes external data for on-chain use. Without an oracle, a [decentralized] smart contract (e.g. running on Ethereum) will only be able to perform operations with on-chain data, which, as it turns out, is essentially limited to transfers of tokens. Oracles make smart contracts useful, and without them a blockchain lives in its own closed-off world.

In order to be of any practical use, oracles must have certain guarantees on the data they make available. This could take the form of cryptographic attestations (TLSnotary, Town Crier), trusted information sources that put their reputation on the line (Oraclize), or, most desirably, quantitative economic guarantees on manipulation resistance and liveness and safety. The focus of this post, and of my previous PhD research, is on the latter category of oracles.

A Brief History of Athena

Towards the beginning of the summer of 2017, brainstorming began on a decentralized fake-news detection platform. With the increasing exposure to the problem of media-manipulated news (such as the later-revealed fiasco of Facebook and Cambridge Analytica), a solution that enabled and facilitated the judging of the veracity of news articles was sought after.

Using TrueBit’s “forced errors” to incentivize verification of results as an inspiration, my research group and I set out to develop such a platform, code-named Astraea. It rapidly became clear to us that trying to solve this problem had numerous fundamental issues, namely that 1) even if such a platform were developed, education of media consumers would be a requirement for it to be of any use (and education of consumers would already largely solve the problem of fake news in and of itself) and 2) judging on the truthness or falseness of a news article didn’t really make sense, as there was no “ground truth” against which to incentivize user behavior around.

Towards the end of the summer, Ryan Berryhill and I decided to do that thing everyone dreads doing but knows must be done: pivot. The basic premise of a decentralized network of individuals deciding on the truthness or falseness of something remained, but the fundamental premise was changed. Rather than judging and forming opinions, users would be reporting on facts. In other words, we were developing an oracle. We decided on the working name Athena.

How Does Athena Work?

The peer-reviewed paper is available here (for free). Aside: due to academic inertia, it was published under the overall project’s original name, but the system described therein is properly named Athena. I strongly recommend reading the paper before proceeding in this post, as I will skim over many of the details that make it work in favor of the core essence of the system (i.e. familiarity with the source material is assumed).

The basic premise of Athena is simple: we want to decide whether a binary proposition is true or false . The proposition could be anything human-readable (e.g. a content-addressed blob of text). Given a list of such propositions as input, Athena will output decisions for each one. The two key attributes of the system are described below: manipulation resistance and incentive compatibility.

Three [possibly overlapping] groups of users exist in Athena: submitters, voters, and certifiers. The aptly-named “submitters” submit propositions to the oracle, attaching a bounty (in ether, or tokens) to each proposition they submit. Submitters can be thought of as the entities that want the results of a proposition on-chain to be consumed and are willing to pay for this. The second group of users, voters, vote on randomly-assigned propositions in the proposition list. This random assignment, rather than allowing voters to select which proposition they want to vote for, is what gives the system arbitrarily strong resistance to manipulation: given a parametrizable proposition list size, the cost of manipulation any single proposition is in expectation a factor of this size. Voters are rewarded from bounties attached to propositions if they agree with the majority of voting stake for the propositions they voted on and the majority of a non-zero certifying stake.

The third group of users is needed to maintain incentive compatibility. Consider the following thought experiment: if voters were the only group in the system (along with submitters, of course), a degenerate equilibrium exists where voters could just always vote true , and always collect bounties. Would the oracle “work” in this case? That’s debatable, but a solution exists, so there’s no need for a debate! Certifiers choose a proposition to vote on, unlike voters. They are also rewarded if they agree with the majority of certifying stake (and voting stake), but are rewarded only from certification pools, one for each outcome. If the outcome of a proposition is true , certifiers that certified that proposition are rewarded from the true pool, and likewise for false . In other words, certification pools are consumed asymmetrically. If they are, however, funded symmetrically (an implementation detail, there are many ways to do this fairly trivially), it can be seen that this breaks the degenerate case of everyone voting in a single direction, as certifiers would collect no reward in that case.

I’ll repeat what I wrote before, that familiarity with the original work is assumed, so if the above few paragraphs don’t make sense, please read the paper first!

Athena is a departure from previous oracle proposals with the following key properties:

It is highly parametrizable for a variety of different use cases. Unlike Truthcoin-like prediction-market-based “oracles,” it does not require a user to perform work at unexpected times — a user can keep all their stake in the system in cold storage indefinitely without losing anything. Athena can also be used as a data availability oracle for itself, solving a huge problem facing any system that relies on external data, including all other current oracle proposals, and many scaling proposals. Details for how this is done are outlined in the paper itself. The cost of attacking the system (manipulating the outcome of the oracle) can be measured deterministically and quantitatively. No decentralized stake-based system is impervious to attack, and being able to quantify under what conditions its output can be trusted is essential. Thanks to random assignment of propositions from a [potentially large] list, the system can be used in conjunction with applications whose exogenous rewards exceed the stake needed to determine the outcome of any individual proposition.

Shintaku

In the summer of 2018 I stumbled upon an interesting post. It turned out that, in the ultimate spirit of open-source, a community member had taken the system my group and I devised and implemented it! The author named his project Shintaku, which Google translate tells me is the Japanese word for “oracle” (direct link to design paper here). Dubious naming choices aside, this project irons out many of the details that Athena, as an academic research project, could dismiss as “implementation details.”

Again, familiarity with the original work is assumed, as I won’t be repeating everything here.

The creator of Shintaku elegantly re-frames the problem Athena was trying to solve as essentially the verifier’s dilemma. The “degenerate case” described in the previous section can be framed as users in the system not verifying the outcome of the oracle because they have nothing to incentivize them to do so — they can just always vote one way and get rewarded. While Athena resolves this using symmetrically-funded and asymmetrically-consumed certification pools, Shintaku proposes assigning voters two propositions randomly (and doing away with certifiers and certification pools entirely), and only rewarding them if the two votes are different (and, of course, in agreement with the majority voting stake, etc.).

Beyond that, the author also discusses requirements for an end-to-end-decentralized system, which I think was sorely lacking in this space. I’ve seen too many projects that claim they are “decentralized,” but include single-key backdoors and escape hatches that could and have compromised their systems at any time, defeating the whole purpose of a decentralized application.

What Still Needs to be Implemented?

While the majority of the code is implemented, from my brief inspection of the code-base the following things pop out as needing to be hashed out:

Standardization. The API and interface haven’t gone through community feedback yet. I’m sure that the community at large have requirements for an API that a single developer or small team would never be able to think of. Testing of the oracle code. The current test suite looks incomplete, but then again given that standardization doesn’t exist, the whole interface could change in the near future depending on what the community wants. Improved random number generation. In its current state, Shintaku uses a variant of RANDAO that can either function as a standard RANDAO and make no liveness guarantees, or ensure liveness at the cost of potential minor manipulability of the random number generation.

I’m looking forward to see what the community will contribute to this project over time, as it is one of the few truly decentralized works in the blockchain space.

The Future of Oracles

TL;DR the oracle problem is solved. An end-to-end decentralized general-purpose oracle system basically exists for use today, in the form of Shintaku. While future research can surely improve components of the system (ask my old research group if they’re working on anything in this area if you’re excited about this stuff!), in large part the long-standing problem of getting external data for on-chain consumption is a solved one.

I expect improvements to things like on-chain general-purpose random number generation to be a hot topic in the coming years, which is something that, if done correctly, will be very useful for Athena/Shintaku-based oracles and a multitude of other decentralized applications.

Closing Thoughts

With the oracle problem out of the way, my primary focus going forward is on scalability tools. First- and second-layer scaling solutions and tools have been under active research for several years, and I think it’s a very exciting area with lots of open questions. Some “experts” believe that the future of blockchain is in permissioned chains, and that scaling can be solved by trusting large corporations. Others, who actually know what they’re talking about, believe that scaling must be done in a decentralized manner — which of course makes things much harder.

The absolute state of “experts” in blockchain space.

My future posts will be on the state of and my thoughts on the current state of child-chain and state-channel techniques in the Ethereum space as scaling tools. Take a look at my colleague Kevin Zhang’s state of Plasma post for an introduction on one particular child-chain design that I think will be promising moving forward, and the subject of lots of discussion for several months now.