Soon after I started at Tumblr all the way back in May, I was tasked with reworking our aging notifications system. A notification is the item that shows up in your dashboard, interleaved with posts, that tells you that another Tumblr blog reblogged or liked one of your posts or started following your blog.

Until recently, notifications had been served up by a fairly standard MySQL layer with some extensive memcache caching in front of it. Being associated with core social interactions on Tumblr, this data had an extremely high insertion volume. On servers dealing with notifications, we had long since exceeded the InnoDB global transaction max (1024) and thus had to use a feature of the Percona version of MySQL to even enable us to create so many notifications at once.

Taking a step back to devise a more scalable system, we identified the core properties of a notification:

Ordered by time

Unique (no duplicate notifications)

Medium read/write ratio (60%/30%), mostly thanks to heavy caching

Fixed number of notifications per user

Keyed by user, and read only by him/her

Redis has been described as “a collection of data structures exposed over the network.” With impressive performance characteristics, it was a technology we were excited to evaluate. Moreover, its sorted sets fit the characteristics of notifications perfectly, without the I/O and concurrency pitfalls of implementing a similar structure in MySQL. Sorted sets in Redis are ordered by a score (unix timestamp in our case), contain unique elements (non-repeating collections of strings in redis speak), can be trimmed or appended to cheaply, and are keyed off, well, a key (user in our case).

Based on notification request volume (over 7,500/s) and data set size (23MM blogs, 100 notifications per blog, 160bytes per message), desired response times (<5ms), and the need for fault tolerance and substantial growth, we wanted to preshard our data. While this added complexity allowed for better performance and simpler fault tolerance and growth, it was not complexity we wanted to see make its way in to the web application, so a layer of abstraction in front of Redis seemed wise.

Enter Staircar: a lightweight HTTP service to make interacting with hundreds of Redis instances look easy. We considered some nascent open source projects for the job, but found they had a lot of functionality we didn’t need, and/or struggled with response times once the number of elements in the sorted set started to grow. Thus, we ended up with our own libevent-based interface in C, which presents itself as a RESTful, JSON-speaking service.

Staircar’s performance was beyond our expectations and very consistent, with average response times per request at or under 5ms, even during peak traffic. Extensive benchmarking showed Staircar was able to handle roughly 30,000 requests a second per server.

You can see that the highest density is around that 3-4ms range, with a second band formed around the 5ms range. 99.99% of requests are handled in under 10ms, 98.2% of requests are handled in <= 5ms. (Note that this data is from a client perspective, and shows the performance that a client is typically seeing.)

Notifications have been served solely by Staircar for more than a month now and both it and Redis have continued to be fast and extremely reliable. In future posts we’ll dive into more of the operational details around Redis, the tweaks we had to make to get the best possible performance out of Staircar, as well as our performance analysis methods – including how we use Scribe, Hadoop, and R to arrive at graphs like the one above.

(The name “Staircar” is in our tradition of internal codenames. While it is a bit messy yet to become open source, our plans are to release it to the community in the near future.)