In 2001, Philip, one of our developers, was at a customer site to install his company’s software when the Oracle database crashed. A few cancelled flights later, he had spent an extra 2 days working as the DBA to recover their production database for them. He had been on a single support call to India, England and California for 14 hours — non-stop. Only after that could he actually start the installation. Once he was done, he would only learn about its usage and any issues users might have if his customer called him to let him know.

A few years later, in 2010, I was ready to release some new features to the site I was working on. I turned on a site banner to indicate we were deploying new changes and, 5 minutes later, I SSHed into a server and ran scripts/deploy.sh. 30 seconds later, the new features were up and running and our users were giving us their thoughts. They also caught a bug… An hour later, I did the same dance, and it was fixed.

Acceleration

Software engineering as a discipline has changed drastically over the last 20 years. The most recent era of development is marked by continuous deployment, most famously used by companies like Facebook and GitHub to deploy changes to their sites dozens of times a day. These processes make deployment a snap, and allow pushing subsets of functionality to different user groups. The end result: bugs that arise are typically limited to just a few people, and tend to be relatively small because development can happen very incrementally. They’re easy to fix, few people see them, and they get caught early.

One frequent side-effect of these easy bug fixes is what you might call a sloppier development process. As cloud tools became more commonplace and continuous integration and deployment became more accessible, rigid QA testing regimens have become significantly less common, particularly on smaller, earlier stage teams. Rather than considering a feature ready to deploy when it’s been fully vetted, these small teams have shifted to accepting bugs because their fixes can be deployed quickly. This has allowed uninterrupted productivity, and when paired with proper prioritization of new bugs, has been a great way to keep development pace up and still have a product that works well.

Much of continuous development exists because of tight, automated control over an organization’s full stack of tools and infrastructure: a chat tool can talk to a CI server that can trigger a build and then deploy to a few servers within the organization’s infrastructure. Incoming traffic goes through configured rules that point certain users to certain servers. Deployed code reports back telemetry that can be used to quickly spot a bad deploy and roll it back, automatically or manually. All of these things are straightforward because these organizations run their own systems, control the code and configuration that exists on each system, and receive full telemetry from all functioning machines or VMs in their infrastructure.

Decentralization

Enter decentralized networks. Decentralized networks are characterized by the opposite of control: a development organization that deploys a properly decentralized network controls at most a fraction of the nodes in the overall network. The power returns to the people, but that means the infrastructure is also in the hands of the people. When all the people running a network’s many nodes are independent actors, it’s much harder to coordinate a deployment. A properly designed decentralized network respects the privacy of those running the nodes in the network, so telemetry is opt-in, meaning much of the network may not be feeding telemetry to the developers at all.

Suddenly noticing a bug that affects a subset of users is much harder. Even if a bug gets noticed, getting a fix pushed out to everyone is complicated. In a sense, we’ve regressed to the first story in our post: installing code on someone else’s system, then leaving it without any remote update capabilities. We know what the results of these limitations are: bugs that allow a bad actor to steal millions of dollars in ETH; bugs that leave millions more inaccessible; attempts to circumvent the requirements of decentralization by centralizing core services. In rare cases, a large community can rally to undo the damage, but for both coordination and ideological reasons this can be quite difficult to pull off.

Past, Present, Meet Future

So how should decentralized developers deal with this adjusted reality? Do we discard all the advantages gained by the introduction of continuous deployment, or is there a middle ground? We believe there are many effective practices from the past that can be brought to bear on the difficulties of decentralized development. Combined with specific lessons from continuous delivery, we can realize many of our goals as developers even in a decentralized world while maintaining the security and safety needed by users of decentralized systems.

Internal test networks can allow a limited form of experimentation on new iterations. These test networks can simulate node failure, common transactions, and many other aspects of real-world situations to provide a high-level safety net for network performance and resilience. Unfortunately, it is impossible to simulate the real world completely, so test networks can only carry us so far—eventually you have to ship.

Projects that rely on continuous delivery can ship an imperfect and incomplete minimum viable product and iterate on it very rapidly after publishing; a decentralized project, on the other hand, has to take more care that its initial version is polished, especially from a security perspective. This does not preclude a minimum feature set — in fact, the MVP approach of choosing few features to launch with is arguably more important! When upfront costs are higher, launching with less to get an opportunity to test core philosophies in the wild is even more crucial.

Early on, an enthusiastic community is also a great way to get many of the benefits of a centralized infrastructure: excited node operators will be eager to try new versions and actively provide feedback. As the community grows, greater engagement continues to translate to more ease of deployment. Community management thus becomes even more important as a community that trusts the developers will be more willing to work with them to deploy new changes and reduce the delta between self-operated and community-operated infrastructure. Strong development practices and high quality code also breed trust from the community, which in turn leads to reduced risk and more willingness to deploy new changes.

There are many more ways to borrow and tweak existing lessons, recent and not-so-recent, in a decentralized world. We hope to continue sharing our thoughts on these in the coming weeks as we discuss how we are approaching the first releases of the Keep Network.

Thanks to James Prestwich and Brayton Williams for reviewing early drafts of this story.