Statelessness and the Swiss engineer

The computing community was taken by surprise a couple of years ago by an engineer at CERN reminiscing fondly about pets and cattle in a scientific presentation It took a while for folks to realise said engineer was not exercised by fluffy pets or the oversupply of wholesome dairy animals in Switzerland but servers and virtual machines. The idea being pets are servers that cannot be easily replaced and need to be nurtured, while cattle are representative of virtual machines (VMs) that are easily replaced. This has little meaning for those of us fortunate enough not to grapple with issues of statefull and statelessness or scale out architectures, but we are going to force it down your throat anyway. Please note we are at 10000 feet here and are going to make broad generalizations simply because we are covering a large area of computing, please inform the debate rather than condemn it. Let's keep it happy and pleasant. Let's take a key problem of state and scaling. Normal computer users normally do not have to deal with this, but it becomes an issue in 'scale-out' scenarios.

Scale-up and scale-out

Let us introduce ourselves to the concept of 'scale-up' and 'scale-out'. Let's say you have a machine with a 2GHz processor and 2GB RAM that is running a mail server (not the best example but let's go with that). Your email server is overloaded, users are unhappy and you need to upgrade. Now you can do it in 2 ways, buy a faster processor and more ram so you go get a 4GHz processor and 4GB RAM. This is called scaling up, you can easily see the problem here, next time you have to scale, you will need a 8GHz processor and even more ram, which may not be available or practical. You also have a single point of failure. Any issue with your mail server and mail for the all users is down. An alternative is to scale-out, instead of upgrading your processor and ram, you get a new server to share the load with the old server. So you now have 2 machines instead of one, and if load increases you can add more. And if one mail server were to fail you still have one up. Easy right? No! How are you going to distribute the load and ensure the inboxes are in sync? With scaling out you have to deal with the problem of state, and your data layer being in sync. 2 mail servers or more have to be in sync with each other or the user will see an inconsistent inbox. Let's say an user logs in in the morning and is directed to Server 1, writes a few mails. Logs in an hour later and is directed to Server 2 and sees her mails missing and out of sync. You are toast! In the case of the mail server one solution would be to replicate the inboxes, so both servers have a single view and copy of the user data, and downtime on one does not affect the other. Or have shared storage. Typically one scales by decoupling or breaking down your application (typically composed of the app, database, app server, web server, caching layer) and building a loosely coupled architecture so the individual bits can all be on different instances (ie servers, virtual machines, cloud instances or containers) with a data management layer, session and state management, and you then begin to need things like load balancers to scale. We are not going to go too much into detail here but this excellent presentation on scaling by Chris Munns at Amazon should help end-users understand the underlying architectures. This is a typical problem of state and scale that companies like Facebook, Twitter, Gmail, Youtube and enterprises face everyday that exercises developers and ops teams. Devops would ideally like to simply automate the provisioning of new instances as demand grows and not think too much, but that is impeded by the reality of managing state and data layers as one scales.

What does this have to do with containers?