An example of an unintended consequence

There was a discussion happening on /r/bitcoin yesterday relating to Jihan's mining pools (or at least one of them) no longer mining SegWit transactions. Whatever the rationale behind the decision by the mining pool, it was a decision made possible, even probable, by having two different classes of transactions, which are charged at different rates for the resources that they use. The full thread is here:

It seems fairly clear that at the most fundamental level, that this is a case of additional complexity resulting in an unintended (although in this case foreseeable) consequence.

The second that SegWit was introduced it resulted in different economics between two different types of transactions, all on the same block-chain ledger. This results in a situation where two ideas that seem reasonable based on their own merits, became incompatible. The first of these ideas being that you should be free to run whichever node implementation you wish (i.e., SegWit is optional), and the second idea being that the system needs to retain uniform economic incentives to remain fungible. In other words the ideal situation is that everyone uses SegWit (or no one does). Or in other words, if SegWit remains optional, is creates a fungibility problem.

Oh, is that all?

The previous example might seem minor if fungibility is not your thing (ask a Monero user it it's important!), but the problem for SegWit, however, doesn't end there. I'm by no means a technical expert on the ins and outs of the SegWit software implementation itself, but I do understand software and I do understand how unwarranted complexity can cause code rot in software projects. I have worked in the software industry for more than 2 decades and I can say without reservation that given two different software designs that achieve the same outcomes (ability to make and receive payments) you simply never choose the more complex option unless there is an astoundingly compelling reason to do so. That reason also has to become exponentially more compelling to the increase in complexity. This is because complexity brings additional cost, additional maintenance, but most importantly it brings with it additional risk, both within the software project itself, but also for downstream systems that build upon the base system.

This Hackermoon article does a good job of illustrating the types of challenges that developers face when trying to work with complex code. It just so happens to be a recent example of a bad bug in BTC, but that's just a coincidence I can assure you! ( If you read the linked article you'll immediately see why avoiding unnecessary complexity is a desirable thing ).

Given that I have little technical knowledge about the SegWit implementation (other than understanding that it took an already working implementation and made it more complicated) the only real way I can ascertain if the complexity of the project is having any adverse effect is to look for real world examples of that happening. For cases where the complexity made the project more brittle and difficult to work with internally, for developers, that would manifest in bugs in the implementation (like the kind of thing linked in the previous Hackermoon article). For cases where the complexity made the project more difficult to work with by downstream system developers and 3rd parties, that would manifest in events that were due to (blamed upon) the additional complexity.

The case that I introduced at the beginning of this article is one such case that is arguably as a result of the additional complexity created by SegWit and has been given recent exposure in the Bitcoin community, but there is another case that has happened just recently on another SegWit enabled coin that may not have had the same level of exposure. A description of that event follows.

Before I describe what happened, it should be noted that this particular case resulted in monetary loss for many people invested in the associated cryptocurrency. I personally have a soft spot for the cryptocurrency that I am about to mention since I am a fan of some of the core values of the project, so what I am about to write is purely from the perspective of providing a real world example of the types of things that can go wrong when additional (and often unnecessary) complexity is introduced into a system. This is not being written to attack that particular project in any way (I have no reason to do that), but I feel that this needs to be discussed so that people can see how complexity can come back to cause real world problems.

Loss of funds caused by the complexity of Replace-By-Fee

One of the features of SegWit is called "Replace-By-Fee". My understanding of this feature (and I have used it with BTC) is that it allows a follow up transaction to be broadcast that can retroactively adjust the fee paid for an existing transaction. This feature was created, in my opinion, as an unnecessarily complex work-around for transaction congestion where a low fee might not get processed in quick enough time, or at peak times, even at all. This complex work around was implemented instead of what was a trivial and risk free block size limit increase. I say "risk free" in terms of the code changes that needed to be made; I am deliberately avoiding getting into another block size debate in this article. Certainly there is no debate about which solution was easier to implement and which posed less risk in the short term.

Just recently the Vertcoin (VTC) base pair exchange called Vertpig was severely impacted by the complexity that SegWit introduced. The bottom line is that the exchange lost its own funds and also a large percentage of user's funds. As a user of the Vertpig exchange and someone who provided bug reports and feedback (and often received quick responses and patch notifications from the Vertpig team) I can testify to the fact that the exchange owners were both competent and trustworthy. This was by no means an exit scam (they would have chosen a cryptocurrency with more liquidity if that's what they were planning).

What was the cause of the loss of funds?

In a nutshell it was Replace-By-Fee.

Actually it was a two-fold issue. The exchange had noticed a bug that was caused by their handling (or incorrect handling) of Replace-By-Fee and fixed that bug, however, the fix for that Replace-By-Fee related issue created another more severe issue that was able to be exploited by dishonest actors and resulted in the significant losses described earlier.

Here is an excerpt of the post-mortem written by the Vertpig exchange itself.

At the beginning of October we applied a patch to our Swap system that resolved a bug where if a deposit transaction into a Swap order was replaced using Replace-By-Fee, it would cause an issue where the Swap was never completed.

Unfortunately, applying this patch opened a vulnerability and major bug in the system that processed Swap orders completely wrong if they had their deposit transaction replaced using Replace-By-Fee.

This has resulted in a loss of not only our own funds but more importantly a large percentage loss of our users funds. The way this vulnerability affected the system, it triggered none of the many security checks we continually ran on our platform.

The full post from Vertpig can be found here:

Not a bug in SegWit, or is it?

I assume that some people will be quick to point out that this was not a bug in SegWit that caused the problem and technically speaking it wasn't. Though, when it comes to good software design the point becomes debatable. Is software that is designed poorly and needlessly complex, that works exactly as designed, buggy? I would argue that its a gray area at a minimum. If the design of a system is such that it makes writing downstream code to interact with it more difficult and any bugs in that downstream 3rd party code can result in people losing money, then it certainly seems like it's a significant problem. Any successful block-chain is bound to have multitudes of downstream solutions (exchanges, wallets, payment processors, mechant PoS systems etc) developed and all of these systems need to navigate the extra complexity in the base system. This is a risk that gets multiplied out across the extended system.

Conclusion

This article has described two examples of unintended downstream consequences of additional software complexity. SegWit was the target of the discussion since the examples were both very recent. The impact of one of these was debatable, but the consequences for Vertpig were indisputably catastrophic for both the exchange operators, investors and users alike.

From working with software developers (and being one myself) for many years there is a mismatch, perhaps, between software developer ambition and what's good for a software project. Often the best code is the simplest, cleanest, most boring code. Developers, however, are competitive and want to be on the cutting edge, use new approaches, solve the same problem different ways, find out what's possible. Often in that process, however, those developers also find out what won't work and what's not possible, or what didn't work out as nearly as well as they had imagined. This is great, for situations such as brand new projects where no one is relying on the code, and major overhauls (or even complete re-writes) can be made, but a block-chain project is a live distributed project that is sensitive to code changes that can introduce bugs and security flaws, as well as, as we have seen, the potential downstream effects of extra complexity. Every downstream implementation of a wallet, or an exchange or a PoS system, and so on, has to deal with the extra complexity, introducing risk.

For a software project where the risk is to user funds and thus the credibility of the entire project as a whole, surely the only sane approach is to favor a software design and implementation methodology that embraces that time tested mantra of KISS (Keep it simple, stupid).