We were using memcache in our application for a long time and it helped a lot to reduce DB servers load on some huge queries. But there was a problem (sometimes called a “dog-pile effect”) – when some cached value was expired and we had a huge traffic, sometimes too many threads in our application were trying to calculate new value to cache it.

For example, if you have some simple but really bad query like

1

SELECT COUNT ( * ) FROM some_table WHERE some_flag = X

which could be really slow on a huge tables, and your cache expires, then ALL your clients calling a page with this counter will end up waiting for this counter to be updated. Sometimes there could be tens or even hundreds of such a queries running on your DB killing your server and breaking an entire application (number of application instances is constant, but more and more instances are locked waiting for a counter).

So, how could we avoid such a problem? First thing came to my mind was: “What if we’d mark old counter as ‘expired’ and then only one thread would re-calculate a counter while all other clients would use old value?”. The idea looks great, but when we cache something in memcached, we it is hard to say when a value vas saved to the cache and when it is going to be expired. After a small research I’ve found a much more elegant solution: we could create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier. So, when we try to read a value from memcached, we try to read STALE key too. If it is expired, it is time to start re-calculation (and set STALE key again with some short TTL).

Final solution we end up using is following (monkey patch for the ActiveRecord::Cache class from the RobotCoop’s memcache-client library):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

# Anti-dog-pile effect caching extension

module ActiveRecord

class < < Cache

STALE_REFRESH = 1

STALE_CREATED = 2



# Caches data received from a block

#

# The difference between this method and usual Cache.get

# is following: this method caches data and allows user

# to re-generate data when it is expired w/o running

# data generation code more than once so dog-pile effect

# won't bring our servers down

#

def smart_get ( key, ttl = nil , generation_time = 30 . seconds )

# Fallback to default caching approach if no ttl given

return get ( key ) { yield } unless ttl



# Create window for data refresh

real_ttl = ttl + generation_time * 2

stale_key = "#{key}.stale"



# Try to get data from memcache

value = get ( key )

stale = get ( stale_key )



# If stale key has expired, it is time to re-generate our data

unless stale

put ( stale_key, STALE_REFRESH, generation_time ) # lock

value = nil # force data re-generation

end



# If no data retrieved or data re-generation forced, re-generate data and reset stale key

unless value

value = yield

put ( key, value, real_ttl )

put ( stale_key, STALE_CREATED, ttl ) # unlock

end



return value

end

end

end

Since it is a monkey patch, you need to place this piece of code wherever you want, but it should be used AFTER memcache-client is loaded (for example, you can put it to your config/initializers/ directory or just copy-paste to your environment.rb. And example usage of this patch is following:



1

2

3

4

5

6

7

8

# This would fall back to a generic get() method because TTL was not provided

Cache. smart_get ( 'test' ) { some_huge_calc }



# This would cache your calculation results for a 160 and will re-generate cache in 100 seconds

Cache. smart_get ( 'test' , 100 ) { some_huge_calc }



# This would cache your calculation results for a 120 and will re-generate cache in 100 seconds

Cache. smart_get ( 'test' , 100 , 10 ) { some_huge_calc }

So, this is it – with a simple change we’ve fixed really annoying problem and made our application much more stable.