My name is Marc Brooker. I've been writing code, reading code, and living vicariously through computers for as long as I can remember. I like to build things that work. I also dabble in brewing, cooking and skiing.I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I lead engineering on AWS Lambda and our other serverless products. Before that, I worked on EC2 and EBS.All opinions are my own.

The Essential Barbara Liskov

Barbara Liskov is one of the greats of computer science. Over a research career nearing 45 years, she's had a resounding impact on multiple different fields, and received an impressive list of honors and awards, including the 2009 Turing Award. In the same spirit as The Essential Leslie Lamport and The Essential Nancy Lynch, I thought I'd write about some of my favorite Liskov papers. These are just papers I like or found particularly interesting, and I'm likely to have missed some you like.

What does it mean for one type to be a subtype of another? We argue that this is a semantic question having to do with the behavior of the objects of the two types: the objects of the subtype ought to behave the same as those of the supertype as for as anyone or any program using the supertype objects can tell.

Data abstraction and hierarchy, A behavioral notion of subtyping and Behavioral subtyping using invariants and constraints, are why most working programmers would recognize Liskov's name. The Liskov Substitution Principle, widely known as the L in SOLID, is a widely-followed rule about the relationship between the behavior of its supertypes and subtypes.

I'll readily admit that types are an area of computer science that I'm not very familiar with, but I found these papers easy to follow and very applicable. For the working programmers, there's not much material there that isn't covered in the wiki page, but it's worth reading to see how Liskov lays out the arguments for the principle. If you're interested in the history and thinking behind rules, these are great papers to read.

Availability is achieved through replication. Transaction processing depends on forcing information to backups so that a majority of cohorts know about particular events.

Viewstamped Replication: A New Primary Copy Method to Support Highly Available Distributed Systems deserves to be recognized as one of the most influential papers in distributed systems. Viewstamped replication predates Lamport's Paxos, but solves the same problem in a very similar (though distinct) way. The viewstamped replication paper remains both readable and relevant, although some of the descriptions and formalisms used show the paper's age. If you only have time to read one paper on Viewstamped Replication, the recent Viewstamped Replication Revisited is probably a better bet, because it provides a clearer description of the protocol and the design decisions it makes.

I've written before about viewstamped replication, and why I think it should be more widely recognized for the contributions it made to consensus, and the idea of state machine replication. It would be great to see more knowledge about VR among distributed systems practitioners.

Unlike other multistamp (or vector clock) schemes, our scheme is based on time rather than on logical clocks: each entry in a multistamp contains a timestamp representing the clock time at some server in the system.

Version vectors (or vector clocks or multistamps) are a very widely used scheme for versioning data in distributed systems, but keeping them short in the face of scaling or reconfigurations and scaling is a real challenge. Lazy consistency using loosely synchronized clocks paper presents an one approach, using loosely synchronized physical clocks. An interesting aspect of it is that it breaks from the orthodoxy that using clocks for ordering is bad (which it is, unless you are careful with your safety properties). If you find reading this worthwhile, I'd also recommend Dynamic Version Vector Maintenance and Flexible update propagation for weakly consistent replication.

One reason why Byzantine-fault-tolerant algorithms will be important in the future is that they can allow systems to continue to work correctly even when there are software errors. Not all errors are survivable; our approach cannot mask a software error that occurs at all replicas. However, it can mask errors that occur independently at different replicas, including non-deterministic software errors, which are the most problematic and persistent errors since they are the hardest to detect.

Byzantine faults happen all the time in the real world. Handling them, and making safe progress in distributed systems despite them, is a huge challenge. Just getting the theory right is tricky. Getting to a practical implementation of Byzantine fault tolerance is even harder. That's what makes Practical Byzantine fault tolerance so important. Castro and Liskov describe an practically implementable Byzantine fault tolerant system, that works in a realistic system model. The key contribution here, over precursors like Rampart, is safety in real-world system models, especially removing assumptions about synchronicity.

On a similar topic, Byzantine clients rendered harmless is also worth reading. Many approaches to Byzantine-fault-tolerant replication make both safety and liveness assumptions about the behavior of clients. Strengthening the protocol against client behavior is very important to practical systems.