Photo by Christian EM on Unsplash

Latency Improvement with In-Memory Caching using Caffeine

How we improved latency by 30% while ensuring stability

At Hotels.com, we used in-memory caching to improve our application’s p95 response time from 50ms to 35ms. We also reduced its database’s occasional breakdowns down to zero over the past four months.

This post will take you through our experience with in-memory caching, how we chose Caffeine, and the results we achieved.

Background

Our team, the Core-Services Team, supports the property catalogue service for one of the world’s largest hotel booking websites — Hotels.com. With close to 20K hits/second and an agreed-upon SLA of 50ms, the service requires an uptime of close to 100%.

The catalogue service sometimes breached its service-level agreement (SLA) at high loads. On analysis, we found the high latencies were seen when our service hit Cassandra at more than 600K requests per second.

Technical Details

Cache selection

Caching seems like a simple idea — put actively used data in memory which a system can access very quickly and reliably. However, choosing the ideal cache for our needs is crucial. Because Catalogue Service is a critical service within our product, stability and performance of the cache are key parameters for selection of a cache product.

Our requirements for our caching solution were:

Service is developed with Scala. It makes use of Scala futures for database and downstream calls. To cache the result of database or downstream calls, the cache should be able load data using futures without blocking the system threads.

Service can support around 600K database calls per second across three different rings. To cache database calls, the cache should sustain at least 100K operations/second without failure.

Service has to perform serialization and deserialization of huge content, without performance slowdown or excessive increase in JVM threads.

Comparison of commonly used caching solutions

EhCache

EhCache was not considered because it lacks async loading and native Scala support.

Google Guava

We ruled out Google Guava. Though it can perform async loading with different execution contexts, it uses Java completable futures and Guava’s ScalaCache doesn’t provide a wrapper for the feature.

ScalaCache limits Guava to have only string keys.

Caffeine

We chose Caffeine. Scaffeine, a Scala wrapper on Caffeine, satisfied all of our requirements.

Cache Implementation

Because Scala is a functional programming language, an obvious choice was to go with Method cache with higher order functions. A common method of caching is to wrap a function that transforms key-values with memoization.

Cache Configuration

Caffeine exposes several configuration parameters, but we were concerned mainly with cache size and eviction policy. Nimbus cache exposes the following configuration parameters:

Initial capacity

Max size

Eviction after write

Eviction after access

Refresh after write

Cache metrics are published to Graphite for cache hit, cache miss, cache eviction, and loading of objects in cache. Upon service deployment, an empty cache is created and with each cache miss, it is loaded. Cache eviction policy used is the default least-recently used (LRU) from Caffeine, with slight improvements available out of the box.

Results

Database calls reduced by ~280K per second (~50%) across Cassandra rings.

API latencies improved by at least ~30%.

Memory footprint of all 3 caches is ~460MB in heap.

400 JVM threads, an increase of 50 threads per node.

Catalogue Service is more resilient against bot attacks.

One of the Cassandra rings reduced to 18 nodes from 24 nodes.

Heap size increased to 6GB from 4GB to accommodate cache and decrease garbage collection pause.

Cache hits

PIMMS Column Family Cache hits/misses

Supplementary Data Column Family Cache hits/misses

LDS column family Cache hits/misses

Latency

Property Catalogue Service latency

Future Plans

Optimize the cache size, eviction time, and initial capacity with the least performance hit

Experiment with a centralized cache, Elasticache, for Catalogue Service and compare results

Introduce smaller caches across the system with small memory footprints to improve API latency further

It has been four months since we implemented this solution. This solution has helped us resolve issues we had from earlier Cassandra breakdowns. We put quality time and effort into improving our customer experience.

Our experience with Caffeine has been great thus far — its code is easily understandable, it has great community support, it was easy to implement, and we have a working solution that solves our need. The key takeaway here is for you to select your in-memory caching solution based on your needs. If your needs are similar to what we had, Caffeine might be a good choice to go with.

Do share with us if you liked this article and comment if you have questions for us. And follow us on the Expedia Group Technology blog for more technology-related goodness.