Update (2011-03-14): UNIXy has released the first & only cPanel Varnish Plugin. The plugin installs, configures, and makes it very easy to manage Varnish via WHM. The plugin is fully integrated with cPanel EasyApache, cPanel URLs, and everything else that makes a functional cPanel server. Read about it more here: http://www.unixy.net/varnish

While we are in the business of providing managed server and cluster services, some folks still keep it a heart to manage their own dedicated server or vps nodes. Incontestably, and however much you cherish your web servers, family always takes priority. In this post, we are going to cover a peculiar server configuration that can adapt to abnormal traffic without the server dropping down to its knees or you getting involved… at 3AM on new year’s eve.

The goal is to configure such server to be autonomous for weeks or months at a time without requiring your involvement. Based on observatory experience, we decided to develop a 3-state throttle mechanism where each state is triggered based on server vitals like load average (run queue depth).

Let’s lay down the requirements and goals. We want our shiny adaptable server to be able to:

graciously handle C10K challenges handle a Slashdot / Digg effect without dropping off the net change operating state and adapt to the traffic load

Background

Before we delve into the specifics, we’ll need to cover some important aspects of high performance web servers. All web servers, at least the ones worthy to be called as such, must implement the standards defined in RFC2616. What differentiates these web servers, however, is their feature set and their ability to manage large sets of concurrent connections. More specifically, they need to be able to handle the C10K problem gracefully. At least that’s what matters the most to us in this article.

So what is this C10K problem? C10K is a condensed term coined to mean 10000 concurrent clients. A Web server that solves the C10K problem should be able to handle much more than 10000 concurrent connections transferring small-size files on a 1Gbps internet connection using 1GB of memory without causing any noticeable slow down

At a first glance, the C10K challenge appears to be easily achievable. The truth of the matter, the implementation not only requires efficient programming but also a mastery of low level system API and system calls. We’re not going to cover the fine details here. But implementors that focus too much on looking good in benchmarks end up building software that performs poorly in the real world. As a matter of fact, such systems have little practicality in this new hip Web world.

In this new hip online world, serving static files represent a very tiny fraction of the work put forth by a Web server. Server side processing accounts for the majority of the effort. So now not only do we expect a Web server to deliver relatively-simple static objects at sub milisecond speeds but also hold it accountable for the time it takes to complete server-side request. Server-side requests could be anything from PHP, Perl, to Python and Ruby. It doesn’t matter.

Implementation

How are we going to achieve this bullet proof Web server? We have been imposed the following requirement as far as Web server stack:

Linux 2.6.x

Apache 2.2.x

MySQL 5.x

PHP 5.2.x

While the requirement is reasonable we still need a secret sauce engine that will make a decision for us so we know how to throttle back and forth between the three states. We have very little flexibility in implementing this engine “inside” the LAMP stack as it means fiddling with the client’s environment. We needed a seamless dynamic way of throttling. That is no Web server restarts are permitted and the whole throttling action should take less than a 100ms.

We decided to leverage Varnish Cache’s ability to load and compile a new set of configuration instructions on the fly. This essential function is not the only feature that sold us; the main advantage is that Varnish is a very capable caching engine and satisfies all three of our initial requirements (adaptability, state machine, C10K). We will load three Varnish configuration and these are as follow:

Light configuration:

This configuration of Varnish caches static files only. We’re still letting Apache do a lot of leg work still and that is fine considering we’re getting low traffic in this state and a low impact on perceivable performance.

Moderate configuration:

In this state, we enable caching but we respect cache control headers coming out of Apache and PHP. In other words, we cache only what we are told to cache. Sessions do not get cached in this state. Having too many “logged in” users could overpower the server. Hence, the need for a third state.

Heavy configuration:

Enable caching but still respecting cache headers. Additionally, we enable per-user session caching (cookie based caching). There needs to be plenty of memory to handle as many users as possible since Varnish will be creating a cache entry for each registered user and accessed object.

The above diagram illustrates the “gear switching” logic of our engine. The pieces that we need to implement this logic are the following:

The decision engine based on overall system vitals The Varnish configuration loader A daemon to tie it all together

We picked Python as as tool to accomplish the above. We decided to leverage the python-daemon package as well as the python-varnish manager. The former gives the daemon functionality as we intend to run the engine full time in the background. The latter is a simple wrapper around telnetlib and an interface to interact with Varnish’s admin port.

One of the tricky questions we needed answered revolves around the mechanism used to trigger a state transition. We have already decided on the load average being the metric we poll and base our decision upon. What remains to be determined is the value threshold that triggers the transition. A high load average does not always mean a stressed server nor does a low load average always mean an optimal server configuration.

As a rule of thumb, however, and based on experience, the load average is almost always reflective of resource capacity. We we are going to have to trust the kernel on this one. While we have covered the when, the how still needs some demonstration. As mentioned above, Varnish comes with a neat feature that allows it to load and use a new configuration file on the fly. It takes the new configuration, compiles it, and finally loads it up for execution. All of that without dropping a packet. Think of it as importing a python module at run-time. Here is a quick demo:

Here is now where the on-the-fly loading of our config happens. Of course, with our Python daemon program the state transitions are triggered depending on the load:

To recap, we wrote a program, python daemon, that monitors the vitals of a dedicated server and then determines the next step to take. In this case, we have preloaded our caching engine with three configurations: light, moderate, and heavy. These configurations anticipate and handle different levels of traffic and load. Our python program makes use of any of the three configurations by loading them into Varnish.

That’s all folks! We hope you enjoyed this article.