Evan Weaver, Lead Engineer in the Services Team at Twitter, who’s primarily job is optimization and scalability, talked about Twitter’s architecture and especially the optimizations performed over the last year to improve the web site during QCon London 2009.

Most of the tools used by Twitter are open source. The stack is made up of Rails for the front side, C, Scala and Java for the middle business layer, and MySQL for storing data. Everything is kept in RAM and the database is just a backup. The Rails front end handles rendering, cache composition, DB querying and synchronous inserts. This front end mostly glues together several client services, many written in C: MySQL client, Memcached client, a JSON one, and others.

The middleware uses Memcached, Varnish for page caching, Kestrel, a MQ written in Scala, and a Comet server is in the works, also written in Scala and used for clients that want to track a large number of tweets.

Twitter started as a “content management platform not a messaging platform” so many optimizations were needed to change the initial model based on aggregated reads to the current messaging model where all users need to be updated with the latest tweets. The changes were done in three areas: cache, MQ and Memcached client.

Cache

Each tweet is tracked in average by 126 users, so there is clearly a need for caching. In the original configuration, only the API had a page cache that was invalidated each time a tweet was coming from an user, the rest of the application being cacheless:

The first architectural change was to create a write-through Vector Cache containing an array of tweet IDs which are serialized 64 bit integers. This cache has a 99% hit rate.

The second change was adding another write-through Row Cache containing database records: users and tweets. This one has a 95% hit rate and it is using Nick Kallen’s Rails plug-in called Cache Money. Nick is a Systems Architect at Twitter.

The third change was introducing a read-through Fragment Cache containing serialized versions of the tweets accessed through API clients which could be packaged in JSON, XML or Atom, with the same 95% hit rate. The fragment cache “consumes the vectors directly, and if a serialized fragment is currently cached it doesn’t load the actual row for the tweet you are trying to see so it short-circuits the database the vast majority of times”, said Evan.

Yet another change was creating a separate cache pool for the page cache. According to Evan, the page cache pool uses a generational key scheme rather than direct invalidation because clients can

send HTTPs if-modified-since and put any time stamp they want in the request path and we need to slice the array and present them with only the tweets they want to see but we don’t want to track all the possible keys that the clients have used. There was a big problem with this generational scheme because it didn’t delete all the invalid keys. Each page that was added which corresponding to the number of tweets people were receiving would push out valid data in the cache and it turned out that our cache only had a 5 hour effective life time because of all these page caches flowing through.

When the page cache was moved into its own pool, the cache misses dropped about 50%.

This is the current cache scheme employed by Twitter:

Since 80% of the Twitter traffic comes through the API, there are 2 additional levels of cache, each servicing up to 95% of the requests coming from the preceding layer. The overall cache changes, in total between 20 and 30 optimizations, brought a

10x capacity improvement, and it would have been more but we hit another bottleneck at that point … Our strategy was to add the read-through cache first, make sure it invalidates OK, and then move to a write-through cache and repair it online rather than destroying it every time a new tweet ID comes in.

Message Queue

Since, on average, each user has 126 followers, it means there are 126 messages placed in the queue for each tweet. Beside that, there are times when the traffic peaks, as it was during Obama’s inauguration when it reached several hundreds of tweets/second or tens of thousands messages into the queue, 3 times the normal traffic at that time. The MQ is meant to take the peak and disperse it over time so they would not have to add lots of extra hardware. Twitter’s MQ is simple: based on Memcached protocol, no ordering of jobs, no shared state between servers, all is kept in RAM and it is transactional.

The first implementation of the MQ was using Starling, written in Ruby, and did not scale well especially because Ruby’s GC which is not generational. That lead to MQ crashes because at some point the entire queue processing stopped for the GC to finish its job. A decision was made to port the MQ to Scala which is using the more mature JVM GC. The current MQ is only 1,200 lines and it runs on 3 servers.

Memcached Client

The Memcached client optimization was intended to optimize cluster load. The current client used is libmemcached, Twitter being its most important user and contributor to the code base. Based on it, the Fragment Cache optimization over one year led to a 50x increase in page requests served per second.

Because of poor request locality, the fastest way to deal with requests is to precompute data and store it on network RAM, rather than recompute it on each server when necessary. This approach is used by the majority of Web 2.0 sites running almost completely directly from memory. The next step is “scaling writes, after scaling reads for one year. Then comes the multi co-location issue” according to Evan.

The slides of the QCon presentation have been published on Evan’s site.