Abstract: This article describes the design of a new opcode for the Bitcoin Cash scripting language called “OP_CHECKDATASIG” [ 1 ]. Originally designed to allow Script to import and validate arbitrary messages from outside the blockchain, the evolved design also opens open exciting possibilities for cross-chain atomic contracts.

Background

When someone sends a Bitcoin Cash transaction, they sign it to prove to the network that the owner of the private key authorizes the transaction. The way the signature works is that it uses an opcode called “OP_CHECKSIG” that calculates a hash based on a portion of the data in the transaction, and checks the signature against that. The signature is equivalent to a contract that says the owner of the private key authorizes the transfer. OP_CHECKSIG has a few different ways it can hash the transaction (known as the “Sighash”), to allow different conditions on the transfer. In general, however, it is equivalent to a contract that only defines the transfer of money. But what if you could also sign other pieces of information in the transaction? This would allow other information to be included and signed as part of the transfer contract. This is the idea behind OP_CHECKDATASIG.

One way to think of OP_CHECKDATASIG is as an un-bundling of OP_CHECKSIG. The design of OP_CHECKSIG bundles two distinct concepts together: calculating the Sighash, and checking the signature of that hash. If we imagine re-implementing OP_CHECKSIG as two instructions, OP_CHECKDATASIG would be the second instruction. It just checks a signature against a supplied message and public key. Because of this unbundling, OP_CHECKDATASIG can be used more flexibly to verify signatures for any message from outside the blockchain.

The Story

The motivation behind OP_CHECKDATASIG is to continue the process of improving the Bitcoin Cash Script language. As such, the instruction is intended to be a generic and flexible building block that can find many uses.

OP_CHECKDATASIG is based on Andrew Stone’s OP_DATASIGVERIFY proposal [ 2 ]. The original motivation behind the opcode is to be able to validate signatures from an “oracle” on messages from outside the blockchain. This use case has been described in an article by Andrew Stone [ 3 ]. This use opens up a vast array of possibilities, since it allows scripts to operate on messages from the outside world.

After the original proposal was made, it went through a series of changes and design refinements based on review and discussion by stakeholders and subject-matter experts [ 4 ].

Overall, the theme of the changes was to make the design mirror existing opcodes more closely. This makes it a more conservative and minimal change than the original proposal. Though the design choices of the original proposal may have had certain advantages, in general the reviewers preferred an approach of avoiding implementation complications and sticking close to the design of what is already in the protocol. This helps lower risk by creating minimal change in the implementation. All the little quirks of OP_CHECKSIG are well understood, having been battle-tested on the blockchain for many years. Sticking to the same underlying primitives keep the design in a well understood and safe territory. As a result, the implementation details of OP_CHECKDATASIG are very close to OP_CHECKSIG [ 5 ].

An interesting thing happened on this journey, however. Though the driving motivation of the design changes were conservatism and safety, by making the implementation mirror OP_CHECKSIG, some novel potential use-cases were discovered.

The original OP_DATASIGVERIFY proposal took a message of any size as input, then hashed it before checking the signature. Through the review process, following the idea of structuring OP_CHECKDATASIG as the second step in OP_CHECKSIG, it seemed like a nice design to make the opcode take a hash value as input. This also fit with the philosophy of keeping the opcode as a simple minimal building block.

Upon further review, it was realized that passing in a value to be signed without hashing it is potentially insecure. This security hole was identified by Andrew Stone on June 16th. The security hole It would not be a problem if used properly, but made it possible for Script authors to use it insecurely if they were careless. So the design was changed back to hashing the input with double-SHA256, which is the standard hash method in Bitcoin.

However, in the period where the design did not hash the input, people (awemany in particular) realized something interesting: the fact that the opcode did not hash the input meant that you could pass in a Sighash value from another transaction as the message, and the signatures would match. This means that it becomes possible for OP_CHECKDATASIG to test whether it has been supplied with the valid signature for a completely different transaction.

Changing the opcode to doing a double-SHA256 hash on the input would still allow this sighash-as-message use, however now you would have to pass in the pre-image of the hash, which is the serialized transaction data. This data would typically be be hundreds of bytes which would need to be included in the transaction, making this use unwieldy. Luckily, the reviewers found a solution that threaded the needle between both options, yielding both safety and convenience: do a single-SHA256 hash in the input.

Doing a single-SHA256 means the input is hashed, so it is secure, but it also means that a partially-sighash which has only gone through one round of SHA256 can be passed in (only 32 bytes on the stack), which when hashed again with another round will yield the full sighash.

It is also notable that PGP signatures use a single-SHA256. So moving to single-SHA256 makes the signature also potentially compatible with PGP, which also opens up potential uses.

These capabilities open up exciting possibilities. It means that OP_CHECKDATASIG can make spendability dependent on completely separate unrelated transaction being signed. This can work even for transactions on different blockchains, as long as the transaction signing algorithm is compatible with Bitcoin Cash. This group includes Bitcoin, Litecoin, Dash, ZCash. I will leave it to people more creative that I to find novel ways to use this capability. There have already been several ideas floated such as atomic digital goods purchase, and double-spend prevention [ 6 , 7 ]. I look forward to many new uses I haven’t thought of.

Conclusion

The addition of OP_CHECKDATASIG will add a useful new capability to the Bitcoin Cash scripting language: the ability to import and validate messages from outside the blockchain. By reaching out and engaging subject matter experts and different stakeholders, the initial proposal was modified to address concerns. Though this process, not only did it retain the core utility, but the capabilities were expanded into exciting new areas. The modifications made the new version a more conservative change that sticks closer to the existing system, and also a more flexible tool with novel capabilities.

Acknowledgements

My (Antony Zegers aka Mengerian) role in this process was to reach out to reviewers and coordinate technical discussion and feedback. The information in this article, and the design of the opcode, are a synthesis of the ideas and input of all the reviewers.

I would like to thank Andrew Stone for making the original proposal, Amaury Sechet for coding the first complete implementation in Bitcoin ABC, and all the reviewers, including Clemens Ley, Chris Pacia, Amaury Sechet, Andrew Stone, Mark B. Lunderberg, awemany, and others for their contributions.

References

[3] Andrew Stone’s article on Scripting https://medium.com/@g.andrew.stone/bitcoin-scripting-applications-decision-based-spending-8e7b93d7bdb9

[4] Initial Peer Review of Proposal https://github.com/bitcoincashorg/bitcoincash.org/pull/10

[6] Use of OP_CHECKDATASIG for atomic digital good purchase https://www.reddit.com/r/btc/comments/96fxvy/op_checkdatasig_is_copying_blockstream_and_is/e4520xa/

[7] Use of OP_CHECKDATASIG for double-spend fraud assurance https://bitco.in/forum/threads/gold-collapsing-bitcoin-up.16/page-1213#post-75916

Appendix A: Design Details

OP_CHECKDATASIG has some different design choices than Andrew Stone’s OP_DATASIGVERIFY proposal, resulting from the peer-review and feedback. The changes largely consisted of making the opcode closer to the existing system, and minimizing potential risks. The design is intended to best balance requirements of functionality, minimizing the attack surface, and risk avoidance. The goal of this process of review and refinement was to converge to implementation that has the best set of properties for the long term future of Bitcoin Cash.

This is a list of differences between OP_DATASIGVERIFY and OP_CHECKDATASIG:

Signature format same as OP_CHECKSIG, rather than pubkey-recoverable signature.

Takes pubkey as input, rather the pubkeyhash.

No “type” field with the signature. All signatures are treated as strict DER encoded ECDSA.

Hashes the input with single-SHA256 rather then double-SHA256.

Includes non-verify version, which does not immediately mark transaction as invalid if it fails, simply returns “False”. Returns “True” if successful. For verify behavior, use OP_CHECKDATASIGVERIFY.

Does not leave input message on the stack, all three input values are removed.

Order of inputs is different.

The following sections will expand on some of the reasoning for the changes.

Signature format

The original design of OP_DATASIGVERIFY used pubkey-recoverable a signature format similar to what is used in the signmessage/verifymessage RPC. All of the reviewers suggested making the signature format mirror the existing OP_CHECKSIG implementation.

The reason for sticking close to the OP_CHECKSIG format is largely to lower risk, and keep the implementation manageable. Since OP_CHECKSIG has been part of consensus for a long time, its characteristics are well understood. This means that specific of the encoding can be treated exactly the same as what is already there. It is possible that Andrew Stone’s suggested signature format had advantages, but the reviewers felt it also introduced potential unknowns. Issues such as potential malleability, and sighash accounting, would have taken significant amounts of work and study to resolve, and even then would have some risk just because it is different from what is already there. Mirroring OP_CHECKSIG closely allowed all the quirks such as low-s, nullfail, and sighash counting to be done in exactly the same way, thus not introducing any unknowns.

This choice also means that the opcode has to take the public key as an input, rather than the public key hash.

No “type byte” field with the signature.

All signatures are treated as strict DER encoded ECDSA. At first glance, it may seem that to keep the signature format similar to OP_CHECKSIG, we may want to add a “type byte” at the end, in place of the sighash byte. The sighash byte, however, has nothing to do with the signature, it specifies how to process the transaction data to generate the hash that is to be checked against the signature. Since the message checked by OP_CHECKDATASIG comes from externally supplied data, it is unnecessary to have a flag specifying how the data is to be generated.

Any potential future migration to a new signature type such as Schnorr would have to accommodate the OP_CHECKSIG family of opcodes. Since they have no explicit provision for signature versioning, some method would have to be used that does not rely on signature version byte. This implies that there is little benefit to including a type byte for OP_CHECKDATASIG.

Message hashed with single-SHA256

The original OP_DATASIGVERIFY proposal took a message of any size as input, then hashed it with double-SHA256 before checking the signature. Changing this to single-SHA256 is just as secure, and makes the opcode far more flexible by being compatible with Sighashes from other transactions, and other signature systems such as PGP.

Stack Handling

The reasoning for changing the order of inputs on the stack is that message could either be supplied by the scriptSig, or the scriptPubKey, depending on the use-case. For example, in the future maybe it could be generated by an opcode in the scriptPubKey. Changing the order to [<signature>, <messageHash>, <pubKey>] makes it easy to accommodate both cases.

For similar reasons, it was decided to remove all inputs from the stack after execution, like all other opcodes do. It is easy to construct a script that leaves the message on the stack, as OP_DATASIGVERIFY did, using OP_OVER.

Appendix B: Bitcoin ABC Implementation

Implementation of this feature in Bitcoin ABC consisted of 32 sets of changes, catalogued as follows:

Prepare for activation: D1563 , D1564

Refactors and code improvements: D1565 , D1569 , D1573 , D1574 , D1576 , D1575 , D1578 , D1589 , D1595

Test additions and fixes: D1566 , D1567 , D1568 , D1570 , D1571 , D1580 , D1596 , D1599 , D1619 , D1620

Separate signature and sighash-type-byte handling: D1572 , D1577 , D1579

Sigops counting: D1597 , D1601 , D1605

Appendix C: Implementation Notes

Disabled vs. Reserved Op Code numbers: Other op codes that have been re-enabled on Bitcoin Cash were formerly disabled. When these op codes were disabled, their use was disallowed from all transaction Scripts.

OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY, on the other hand, use op code numbers that were never previously defined and were considered “reserved”. These reserved op codes were treated differently then disabled op codes, and could appear in transaction script if they were in unexecuted IF branches. This has a few consequences:

The opcode numbers for OP_CHECKDATASIG and OP_CHECKDATASIGVERIFY appear many times in the blockchain in unexecuted IF branches.

Because of this, activation has to be handled differently than for the “re-enabled” opcodes (see https://reviews.bitcoinabc.org/D1563 ).



