tl;dr - I added some caching to an app I’m writing in Haskell using Servant. The implementation is pretty naive, but I’ve seen some speedups (as one would expect from skipping a database access), and am glad I was able to build such a simple solution in a language as expressive as haskell. Skip to the end TLDR section to see all the code laid out!

FAIR WARNING - this will is NOT an interesting article about caching algorithms or a quirk in GHC or optimization strategies. It’s just some notes on how I did some pre-mature optimization implementing a very naive and simple caching mechanism for some endpoints on a Servant-powered web application I’m writing. It also served as a good chance for me to start using Haskell’s STM and concurrency features.

Thanks to working in Haskell on this program for so long, I’ve almost forgotten that the large majority of my code is completely pure and stateless. It was a good feeling to be somewhat confused and how I would maintain a piece of often considered global state like a cache in haskell, and how I would deal with actually changing a value that had already been set. Thanks to immutability-by-default, I’ve been writing trivially parallelizable code this whole time.

Before starting, I thought that this was something I could actually get through fairly quickly, possibly within a day if I could get into a good “flow”. One of the best indicators of good “flow” when I’m writing Haskell is when the compiler doesn’t have to yell at me much (if at all). Rather than being the task master, the compiler becomes your buddy that pipes up every once in a while, and you never forget which parts of the system get affected by changes and generally move the whole system forward at one time, instead of just moving one bit and dealing with the fallout. Enough about that, let’s get into what I did.

Step 0: Some thinking about the problem

There’s a famous saying in Computer Science (attributed to Phil Karlton) that the only two hard things in Computer Science – cache invalidation and naming things. Now that I’m tackling one of them (albeit very naively), My initial assessment that this was going to be pretty simple worries me quite a bit. The concept is simple:

Do expensive database computation

Save the result

Use it later when the same exact query comes in

If the data affected by the query has changed, flush all or part of the cache

Astute readers will note that the if the data affected by the query has changed is the hard bit.

Since the project is a job board, I have an endpoint (Servant handler) jobsFTS (job Full Text Search) which takes a JobQuery type that completely describes the query being done on the database side. Since the state I care about for the hard bit is the database, and it can change at any time, it makes sense to consider the database state as a kind of explicit input to the jobsFTS function. So I have one explicit input (the JobQuery ), and one implicit input (the database state). The function returns a PaginatedList JobWithCompany (yay nice and readable types), so that’s what my cache is going to have to store.

So what about the hard part? The cache invalidation part? I think I can avoid over thinking it: every time a job posting is added/removed/activated/inactivated, I just invalidate ALL the JobQuery caches. This is a VERY inefficient cache, but I’m fine with it, due to the current frequency of jobs actually being added/removed/activated/inactivated. All I need to do now is store the results ( PaginatedList JobWithCompany values) in a concurrent/thread-safe way.

This is the good ol’ RTFM part. I spend a good bit of time reading up on Haskell’s STM wiki page along with the actual docs for the stm package. For some reason there were actually a bunch of dead links on the wiki page. At this point I’m very used to reading haskell generated documentation so I was very comfortable reading the docs of the stm package, but I could see how this would be very offputting for someone new to the language.

Step 0.1: Have the application structure in place already

Before we get into what I did, I want to point out that there’s a very real amount of tech-debt/cruft/code (however you view it) already existing – I’m starting from a app that I architected according to how I like to do things, so you’ll have to work a little harder than I do to understand the structure.

I think the structure of my application is really simple though, I’ve found and continue to find that the component based approached really strikes the right balance for me. Very simply put, if you organize your application in terms of big concerns like “sending emails” or “interacting with the database” or “caching”, you have a very organic separation and organization of code. Well organized code makes a huge difference in a large codebase. These large concerns often have some shared functionality, things like setting themselves up, or tearing themselves down, or getting status, and a language that gives you reasonably ergonomic approach to interfaces (I really like Golang’s approach for this, and of course Haskell’s typeclasses are stellar). Up until a while ago I’d associated this approach with bloated software, but it was important for me to realize personally that just because you wanted to add <Something>Component s all over your code didn’t mean that your code would become the horror that is Java’s AbstractBeanComponentFactory -style frameworks like Spring. Of course, I didn’t invent this pattern, one of the best places I’ve seen it implemented is Stuart Sierra’s Component for clojure.

In my app I already have the structure in place, so today I’m only going to be adding a CacheBackend component (the name is a little unfortunate but I’m too lazy to change it, looks like I just hit the second hard thing in computer science). I was able to copy-pasta some code from my SearchBackend component ([I’ve written about it here][search-backend-post]), and do a quick s/Search/Cache/g and get on the way much quicker. If you don’t have this kind of structure in your own app, I’d highly recommend looking into whether it makes sense for you.

Step 1: Start building the CacheBackend

Just a few minutes of coding and I’m feeling crazy productive and like I’m doing things right, here’s what the early code (that doesn’t compile) looked like:

{-# LANGUAGE OverloadedStrings #-} module Cache.CacheBackend ( makeConnectedCacheBackend , CacheBackend(..) , CacheKey(..) ) where import Config (LoggerConfig(..), CacheConfig(..)) import Data.Maybe (Maybe) data LocalMemory = LocalMemory { searchCfg :: CacheConfig , searchLogger :: Maybe Logger } data CacheKey = ActiveJobFTS \| JobFTS deriving Eq | data CacheBackendError = UnexpectedFailure deriving (Eq) instance Show CacheBackendError where show UnexpectedFailure = "An unexpected failure occurred" makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend) makeConnectedCacheBackend c maybeL = connectCache SQLiteFTS { cacheCfg=c , cacheLogger=maybeL } logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c class CacheBackend c where getCacheLogger :: c -> Maybe Logger -- ^ Connect to the cache backend connect :: c -> IO (Either SomeException s) -- ^ Look up value from the cache lookup :: CacheKey -> c -> r -- ^ Check whether a key has a value hasValue :: CacheKey -> c -> Bool -- ^ Invalidate a value already in the cache invalidate :: CacheKey -> c -> r instance HasLogger CacheBackend where getComponentLogger (CacheBackend _ l _ _) = l

Of course, I forgot one thing – the type for the container that was going to be stored inside LocalMemoryCache it should be something like TMVar (HashMap CacheKey (PaginatedList JobWithCompany)) , in the simplest case. make sure to read up on TMVar s if you haven’t already. Unfortunately, since I’m going to be storing various results (not just PaginatedList JobWithCompany values) in there, I can’t use a regular HashMap CacheKey ??? (the ??? values would be heterogenous, I could wrap them to smoothe them out but for this first implementation I’ll hold off). It turns out there’s an option to use Data-HMap for a heterogeneous map, but I don’t want the extra complexity – I’ll do it the dumb simple way for now:

data LocalMemoryCache = LocalMemoryCache { searchCfg :: CacheConfig , searchLogger :: Maybe Logger , scJobFTS :: TMVar (HashMap CacheKey (PaginatedList JobWithCompany)) , scActiveJobOnlyFTS :: TMVar (HashMap CacheKey (PaginatedList JobWithCompany)) }

Step 2: Actually start writing compilable code

Now that the types look like they could/should work, it’s time to try and sneak one past the compiler – I use the usual undefined trick to mock out implelentations of methods and get to work:

instance HasLogger CacheBackend where getComponentLogger = cacheLoger -- realized that my approach to this was pretty bad, could just use the getter, didn't have to do the pattern matching instance CacheBackend LocalMemoryCache where getCacheLogger = cacheLogger connect = undefined lookup = undefined hasValue = undefined invalidate = undefined

With that stuff mocked out, I tried a compile, and… I forgot a few things:

The CacheConfig type was unspecified (I needed to make a bunch of changes in Config.hs

type was unspecified (I needed to make a bunch of changes in Tons of little errors, typos, previous code from the copy pasta, etc

Forgot to properly name and implement the constructor for a CacheBackend

Import the appropriate STM package and types (modifying .cabal and restarting ghci )

Here’s what the code that compiled looks like:

{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE DataKinds #-} module Cache.CacheBackend ( makeConnectedCacheBackend , CacheBackend(..) , CacheKey(..) ) where import Data.Monoid ((<>)) import Config (LoggerConfig(..), CacheConfig(..)) import Control.Concurrent.STM (STM) import Control.Concurrent.STM.TMVar (TMVar, newEmptyTMVar) import Control.Monad.State (liftIO) import Control.Exception (SomeException) import Data.HashMap.Strict as HMS (HashMap) import Data.Maybe (Maybe) import System.Log.Logger (Logger, Priority(..)) import Types (HasLogger(..), JobWithCompany, PaginatedList) data CacheBackend = LocalMemoryCache { cacheCfg :: CacheConfig , cacheLogger :: Maybe Logger , scJobFTS :: STM (TMVar (HashMap CacheKey (PaginatedList JobWithCompany))) , scActiveJobOnlyFTS :: STM (TMVar (HashMap CacheKey (PaginatedList JobWithCompany))) } -- \| NewImplementation might go here data CacheKey = JobFTS \| ActiveJobOnlyFTS deriving Eq data CacheBackendError = UnexpectedFailure deriving (Eq) instance Show CacheBackendError where show UnexpectedFailure = "An unexpected failure occurred" makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend) makeConnectedCacheBackend c maybeL = connect (LocalMemoryCache c maybeL newEmptyTMVar newEmptyTMVar) logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c class Cache c where getCacheLogger :: c -> Maybe Logger -- ^ Connect to the cache backend connect :: c -> IO (Either SomeException c) -- ^ Look up value from the cache lookup :: CacheKey -> c -> r -- ^ Check whether a key has a value hasValue :: CacheKey -> c -> Bool -- ^ Invalidate a value already in the cache invalidate :: CacheKey -> c -> r instance HasLogger CacheBackend where getComponentLogger = cacheLogger instance Cache CacheBackend where getCacheLogger = cacheLogger connect = undefined lookup = undefined hasValue = undefined invalidate = undefined

Step 3: Add implementations

I’m a little fuzzy on how I should be using the STM monad, and whether I wrote things correctly but since just about everything else is what I THINK I want (at least the types are what they should be), I went ahead with writing the implementations. At this point one more worry I had was the fact that I would now have to juggle a few monads – IO , my Servant custom handler monad ( WithApplicationGlobals Handler <something> ), and now STM as well. Mixing monads often gets me confused (especially when >>= produces a value in a monad I didn’t expect, but I think I’ll be able to liftIO or use the - IO versions of any functions to avoid issues.

Turns out afer a little bit more looking I found out I was using the wrong STM primitive, TVar is what I wanted. As I started writing, things started getting wild (more and more language extensions being required, which worries me because I often don’t understand them fully). In particular, DeriveGeneric , FlexibleContexts and UndecidableInstances kept seeming like solutions to problems I was facing, all to make CacheKey a hashable value. Here’s the code I was puzzled with:

data CacheBackend = LocalMemoryCache { cacheCfg :: CacheConfig , cacheLogger :: Maybe Logger , scJobFTS :: STM (TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany))) , scActiveJobOnlyFTS :: STM (TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany))) } -- \| NewImplementation might go here data CacheKey = JobFTS JobQuery \| ActiveJobOnlyFTS JobQuery deriving (Generic, Eq) getCacheEntry :: CacheKey -> STM (TVar (HMS.HashMap CacheKey a)) -> IO (Either CacheBackendError a) getCacheEntry k stm = atomically stm >>= atomically . readTVar >>= pure . maybe (Left UnexpectedFailure) Right . HMS.lookup k -- ... a ways down ... instance Cache CacheBackend where getCacheLogger = cacheLogger connect = return . Right -- This will have to chance once I have a non-local-memory type of CacheBackend lookup key = getCacheEntry key . scJobFTS hasValue = undefined invalidate = undefined

Note the change to TVar and the start of using STM and the implementations I’ve just started writing. After letting those issues slide for now, I got down to one error that took me quite a while to fix:

/<redacted>/src/Cache/CacheBackend.hs:77:18: error: • Couldn't match type ‘r’ with ‘PaginatedList JobWithCompany’ ‘r’ is a rigid type variable bound by the type signature for: lookup :: forall r. CacheKey -> CacheBackend -> IO (Either CacheBackendError r) at /<redacted>/src/Cache/CacheBackend.hs:77:5 Expected type: CacheBackend -> IO (Either CacheBackendError r) Actual type: CacheBackend -> IO (Either CacheBackendError (PaginatedList JobWithCompany)) • In the expression: getCacheEntry key . scJobFTS In an equation for ‘lookup’: lookup key = getCacheEntry key . scJobFTS In the instance declaration for ‘Cache CacheBackend’ • Relevant bindings include lookup :: CacheKey -> CacheBackend -> IO (Either CacheBackendError r) (bound at /<redacted>/src/Cache/CacheBackend.hs:77:5)

I was trying to use a class to just return any old thing (in checkCacheEntry / lookup :: CacheKey -> c -> IO (Either CacheBackendError r) ), but the problem is that they need to be the same kind of thing all the time for every cache entry if I want to do that – they can’t be heterogenous. I was pretty stumped at this point at why the compiler was giving me those errors, so I did something I do pretty often – ignored the problem completely and went to work on something else!.

Here’s the code for removing entries:

removeCacheEntry :: CacheKey -> STM (TVar (HMS.HashMap CacheKey a)) -> IO () removeCacheEntry k stm = atomically stm >>= \v -> atomically (readTVar v) >>= pure . HMS.delete k >>= atomically . writeTVar v class Cache c where -- ... other declarations ... -- ^ Invalidate a value already in the cache invalidate :: CacheKey -> c -> IO () instance Cache CacheBackend where -- ... other declarations ... invalidate k@(JobFTS _) = removeCacheEntry k . scJobFTS invalidate k@(ActiveJobOnlyFTS _) = removeCacheEntry k . scActiveJobOnlyFTS

Writing this code reminded me that I needed to do some unwrapping, scJobFTS (which is the getter for the job full text search mapping) couldn’t handle every type of caching I wanted to do. I couldn’t ignore the problem I ran into for very long, so I started looking into what a Rigid Type Variable was, and basically just found the answer that it’s a “user-specified type”. What was causing r , the “user-specified type”, to make the compiler unhappy? I was still stumped at this point, but I did know that regular HashMap s do this as well – the dumbest but working way I could do this would be just to devolve to using very explicit lookup methods:

class Cache c where -- ... other declarations ... -- ^ Look up value from the cache lookupJobFTS :: CacheKey -> c -> IO (Either CacheBackendError (PaginatedList JobWithCompany)) lookupActiveJobOnlyFTS :: CacheKey -> c -> IO (Either CacheBackendError (PaginatedList JobWithCompany)) instance Cache CacheBackend where -- ... other declarations ... lookupJobFTS k@(JobFTS _) = getCacheEntry k . scJobFTS lookupActiveJobOnlyFTS k@(ActiveJobOnlyFTS _) = getCacheEntry k . scActiveJobOnlyFTS -- ... other declarations ...

Of course update functionality is also pretty important:

insertCacheEntry :: CacheKey -> a -> STM (TVar (HMS.HashMap CacheKey a)) -> IO () insertCacheEntry k v stm = atomically stm >>= \tv -> atomically (readTVar tv) >>= pure . HMS.insert k v >>= atomically . writeTVar tv class Cache c where -- ... other declarations ... -- ^ Insert value(s) into the cache (will replace existing entries) insertJobFTS :: CacheKey -> PaginatedList JobWithCompany -> c -> IO () insertActiveJobOnlyFTS :: CacheKey -> PaginatedList JobWithCompany -> c -> IO () -- ... other declarations ... instance Cache CacheBackend where -- ... other declarations ... insertJobFTS k@(JobFTS _) v = insertCacheEntry k v . scJobFTS insertActiveJobOnlyFTS k@(ActiveJobOnlyFTS _) v = insertCacheEntry k v . scActiveJobOnlyFTS -- ... other declarations ...

With retrospect, the answer to actually solving the issue I was having turns out to be wrapping the values in some sort of union type that would cover all their possibilities – just like the CackeKey does for all the different kinds of keys. I didn’t actually end up doing this (I ran with the just-write-a-bunch-of-lookup-functions), but just wanted to note that here.

Step 4: Integrating the cache backend into the app

Now that the CacheBackend theoretically works (it compiles at least, and the types do what they’re supposed to), now it’s time to make sure my CacheBackend gets started with the rest of the things the app needs:

-- ... other imports ... import Cache.CacheBackend (CacheBackend(..), makeConnectedCacheBackend) -- ... lots of code ... startApp :: AppConfig -> IO () startApp c = do -- ... lots of other code... -- Set up the cache backend cacheLogger <- buildLogger "App.Cache" (cacheLogLevel cacheCfg) cacheBackendOrError <- makeConnectedCacheBackend cacheCfg (Just cacheLogger) throwErrorIf (isLeft cacheBackendOrError) ("Failed to initialize Cache backend:

" ++ showLeft cacheBackendOrError) let (Right cacheBackend) = cacheBackendOrError -- ... lots of other code ... data ApplicationGlobals = ApplicationGlobals { globalConfig :: AppConfig , globalBackend :: SqliteBackend , globalMailer :: MailerBackend , globalUserContentStore :: UserContentStoreBackend , globalSearchBackend :: SQLiteFTS , globalCacheBackend :: CacheBackend -- NEW! , globalLogger :: Logger , globalCookieKey :: WCS.Key }

EZ PZ compile, though the code for handling the potential failure is very messy. Now theoretcially my cache backend is working, so it’s time to actually use it in some endpoints. Before I do that though, some additions to make it more ergonomic to use and help debugging:

Add logging to the cache backend to log when it’s used @ DEBUG level

Add get-or-set type utility function that either uses a cached value or if there isn’t one runs the computation and saves it in cache

Need to use the utility function on some endpoint, let’s say jobFTS

It took a long time to get these three relatively simple seeming tasks done, but here’s the code for a handler and the lookupOrCompute function (the helper I mentioned):

-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend jobFTS :: Maybe String -> [JobIndustry] -> [CompanyID] -> Maybe Limit -> Maybe Offset -> [TagName] -> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany)) jobFTS term is cs limit offset tags = getBackendWithSearchAndCache >>= \(db, cache, search) -> liftIO (try (lookupOrCompute (getJobListing cacheKey cache) (doSearch search db))) >>= ifLeftThrowServantError -- If the lookup/compute failed then throw the error as is >>= ifNothingThrowError (Err.enveloped Err.jobSearchFailed) -- if the lookup compute worked, but returned nothing >>= pure . EnvelopedResponse "success" "Successfully completed search" where massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags cacheKey = JobFTS massagedJQ -- Do the search with the backend doSearch searchBackend dbBackend = searchJobs massagedJQ searchBackend >>= either (error "Job FTS failed") pure >>= hydrateSearchResultJobIDs dbBackend --- ... a ways down ... -- ^ Lookup or compute a value from cache lookupOrCompute :: IO (Maybe a) -> IO (Maybe a) -> IO (Maybe a) lookupOrCompute lookup compute = lookup >>= maybe compute (pure . Just)

Not too shabby! Here’s a pretty bad bug I noticed though – I actually wasn’t using the MassagedJQ , when I should have been (“massaging” is just what I called a light round of user input validation and enhancement of the search terms). After getting that small bug out of the way, I let ‘er rip!

Turns out there is a huge problem, both actions are hapening (computes are happening after the results should have been saved) which means the values aren’t being saved, and mostly because I’ve been using TVar all wrong – what I needed to store wasn’t the STM TVar (an action in the STM monad that produces a TVar when it’s run), I needed to store the actual TVar itself!

Here’s how I figured this out (this is straighto ut of my notes):

Saving just wasn’t working despite doing exactly what it should be doing (right methods getting called, right log messages getting printed, except every cache check was a miss)

Something is super broken, saving is not working properly

Look back at how I’m supposed to be using TVars

Wonder if I can use TVars outside the STM monad – mabye I should add an STM monad transformer to the servant app…? do they have to share context?

monad – mabye I should add an STM monad transformer to the servant app…? do they have to share context? Nah that seems kinda sucky, because STM contexts get generated when you read though…

Here’s where I hit the realization that maybe I was doing it wrong, I should just use the TVar by itself

Now that it’s working, I get a SWEET speedup. Man it feels good – and I’m not even doing a robust job in any sense of the word, but the speedup is amazing (16ms -> 3ms)

Step 5: Fixes, adding tests

Here’s the bit where I go around and fix up stuff I’ve been ignoring and adding tests to make sure this stuff works in perpetuity. First order of business was going around and changing the SearchBackend to use the Massaged JobQuery rather than just the job query in various places since I stumbled upon that bug. Next was adding some tests – I’ll save you the story on how I started getting around to writing the tests, here’s some code:

-- List of JWCs jwcList :: PaginatedList JobWithCompany jwcList = PaginatedList jwcs 2 where jwcs = [ JobWithCompany (ModelWithID 1 (makeJob True 1 1)) (ModelWithID 1 testCompany) , JobWithCompany (ModelWithID 2 (makeJob True 1 2)) (ModelWithID 1 testCompany) ] -- Query that is used when someone visits the main page basicQuery :: Massaged JobQuery basicQuery = Massaged $ JobQuery "" [] [] (Just 10) (Just 0) [] activeOnlyJobCacheKey :: CacheKey activeOnlyJobCacheKey = ActiveJobOnlyFTS basicQuery allJobCacheKey :: CacheKey allJobCacheKey = JobFTS basicQuery main :: IO () main = hspec spec spec :: Spec spec = around withCacheBackend $ do describe "setup" $ it "works without a logger" $ \c -> isNothing (cacheLogger c) `shouldBe` True describe "value insertion" $ do it "inserts job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList `shouldReturn` () it "inserts active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList `shouldReturn` () describe "value retrieval" $ do it "retrieves inserted job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList >> getJobListing allJobCacheKey c >>= shouldBeSomething >>= (`shouldBe`jwcList) it "retrieves inserted active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList >> getJobListing activeOnlyJobCacheKey c >>= shouldBeSomething >>= (`shouldBe`jwcList) describe "hasValue" $ do it "hasValue doesn't find not-inserted job FTS results" $ \c -> hasValue allJobCacheKey c `shouldReturn` False it "hasValue doesn't find not-inserted active job only FTS results" $ \c -> hasValue activeOnlyJobCacheKey c `shouldReturn` False it "hasValue finds inserted job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList >> hasValue allJobCacheKey c >>= (`shouldBe`True) it "hasValue finds inserted active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList >> hasValue activeOnlyJobCacheKey c >>= (`shouldBe`True) describe "invalidation" $ do it "works for job FTS results" $ \c -> insertJobListing allJobCacheKey c jwcList >> getJobListing allJobCacheKey c >>= shouldBeSomething >> invalidate allJobCacheKey c >> getJobListing allJobCacheKey c >>= (`shouldBe`Nothing) it "works for active job only FTS results" $ \c -> insertJobListing activeOnlyJobCacheKey c jwcList >> getJobListing activeOnlyJobCacheKey c >>= shouldBeSomething >> invalidate allJobCacheKey c >> getJobListing allJobCacheKey c >>= (`shouldBe`Nothing)

BONUS: Hastily implemented timed cache invalidation

Yay for super easy async life in Haskell! Some things are better cached by time and not explicit cache invalidation so for a super simple solution I just spin off a green thread (a very helpful SO post if you’re new to the subject) that will handle the deletion in the future.

timedCacheInvalidation :: (HasLogger c, Cache c) => CacheKey -> c -> Int -> IO () timedCacheInvalidation k c ms = void $ forkIO (delayedInvalidation k c ms) where delayedInvalidation k c ms = threadDelay ms >> try (invalidate k c) >>= logTimedCacheInvalidation k c logTimedCacheInvalidation :: HasLogger c => CacheKey -> c -> (Either SomeException ()) -> IO () logTimedCacheInvalidation k c (Left _) = logMsg c DEBUG ("Timed cache invalidation FAILED for key: " <> show k) logTimedCacheInvalidation k c (Right _) = logMsg c DEBUG ("Timed cache invalidation SUCCESS for key: "<> show k) insertCompanyStats k@(Stats _) c v = insertCacheEntry k v (scCompanyStats c) >>= \res -> timedCacheInvalidation k c invalidationTimeMs >> logCacheUpdate c res insertCompanyStats _ _ _ = throw InvalidCacheKey where cfg = getCacheConfig c invalidationTimeMs = defaultCacheInvalidation cfg

Pretty simple code there, with a single call to forkIO . fork being in the name might scare those of us with more low level backgrounds, as it can be tricky to use, but rest assured, the docs say that it just sparks off “one lightweight, unbound thread”. Almost suspicious of how easy it was, I wrote a test to ensure it worked

describe "timed invalidation" $ it "works for company stats" $ \c -> insertCompanyStats companyStatsCacheKey c testCompanyStats >> getCompanyStats companyStatsCacheKey c >>= shouldBeSomething >> threadDelay (oneSecondMs `div` 2) -- defaultThread delay is one second, wait 2 just in case >> getCompanyStats companyStatsCacheKey c >>= \beforeDeletion -> threadDelay oneSecondMs -- defaultThread delay is one second, wait 2 just in case >> getCompanyStats companyStatsCacheKey c >>= \after -> isJust beforeDeletion && isNothing after `shouldBe` True

Where could I improve/why is this so naive?

There are tons of reasons, but here are the ones I can see at least

More cache keys – there are other things to cache! (going to suck a little bit because of the way this is built, going to have to add accessors and concretely typed lookup / update methods) – I did this after Step 5

/ methods) – I did this after Step 5 Would be nice to have more dynamic lookup and accessor functions , it’s silly to have so many lookup functions in the typeclass

and accessor functions , it’s silly to have so many lookup functions in the typeclass No memory/size bounding checks currently

Relatively dumb/simple cache invalidation semantics, I could always do better/be smarter about how to invalidate

Just about zero thought to performance, no stress testing, just back of the napkin one-off measurements

I’m going to leave all these worries for another day :).

TLDR

Here’s all the code completed!

The jobFTS handler in my API code:

-- ^ Job search (only checks active, becuase that's all that's indexed), using the available search backend jobFTS :: Maybe String -> [JobIndustry] -> [CompanyID] -> Maybe Limit -> Maybe Offset -> [TagName] -> WithApplicationGlobals Handler (EnvelopedResponse (PaginatedList JobWithCompany)) jobFTS term is cs limit offset tags = getBackendWithCacheAndSearch -- get various backends >>= \(db, cache, search) -> liftIO (try (lookupOrComputeAndSave (getJobListing cacheKey cache) (doSearch search db) (insertJobListing cacheKey cache))) >>= ifLeftThrowServantError -- If the lookup/compute failed then throw the error as is >>= ifNothingThrowIOError (Err.enveloped Err.jobSearchFailed) -- if the lookup compute worked, but returned nothing >>= pure . EnvelopedResponse "success" "Successfully completed search" where massagedJQ = massage $ JobQuery (trimSearchTerm term) is cs limit offset tags cacheKey = ActiveJobOnlyFTS massagedJQ -- Do the search with the backend doSearch searchBackend dbBackend = searchJobs massagedJQ searchBackend >>= either (error "Job FTS failed") pure >>= hydrateSearchResultJobIDs dbBackend --- ... near the bottom with the other helper code ... -- ^ Lookup or compute a value from cache lookupOrComputeAndSave :: IO (Maybe a) -> IO (Maybe a) -> (a -> IO ()) -> IO (Maybe a) lookupOrComputeAndSave lookup compute save = lookup >>= maybe computeAndSave (pure . Just) where computeAndSave = compute >>= maybe (pure Nothing) (\res -> save res >> pure (Just res))

Most of the CacheBackend code:

{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE DeriveGeneric #-} {-# LANGUAGE DataKinds #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE UndecidableInstances #-} module Cache.CacheBackend ( makeConnectedCacheBackend , CacheBackend(..) , Cache(..) , CacheKey(..) ) where import Config (LoggerConfig(..), CacheConfig(..)) import Control.Concurrent (forkIO, threadDelay) import Control.Concurrent.STM (STM, atomically) import Control.Concurrent.STM.TVar (TVar, newTVarIO, readTVar, writeTVar) import Control.Exception (Exception, SomeException, throw, try) import Control.Monad.State (liftIO, void) import Data.Hashable (Hashable) import Data.Maybe (Maybe) import Data.Monoid ((<>)) import GHC.Generics (Generic) import System.Log.Logger (Logger, Priority(..)) import Types (HasLogger(..), JobQuery, JobWithCompany, Massaged(..), PaginatedList, UserID, Company, CompanyStats, CompanyID, Limit, Offset, ModelWithID, Tag, TagID) import qualified Data.HashMap.Strict as HMS data CacheBackend = LocalMemoryCache { cacheCfg :: CacheConfig , cacheLogger :: Maybe Logger , scJobFTS :: TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany)) , scActiveJobOnlyFTS :: TVar (HMS.HashMap CacheKey (PaginatedList JobWithCompany)) , scCompanyStats :: TVar (HMS.HashMap CacheKey CompanyStats) -- ... more stores in the cache ... } -- \| NewImplementation might go here data CacheKey = JobFTS (Massaged JobQuery) | ActiveJobOnlyFTS (Massaged JobQuery) | Stats CompanyID deriving (Generic, Show, Eq) -- ... some cache keys have been omitted ... instance Hashable CacheKey => Hashable CacheKey data CacheBackendError = InvalidCacheKey | UnexpectedFailure deriving (Eq) instance Exception CacheBackendError instance Show CacheBackendError where show UnexpectedFailure = "An unexpected failure occurred" -- these could be more descriptive for sure show InvalidCacheKey = "Invalid cache key" makeConnectedCacheBackend :: CacheConfig -> Maybe Logger -> IO (Either SomeException CacheBackend) makeConnectedCacheBackend cfg maybeL = sequence [newTVarIO HMS.empty, newTVarIO HMS.empty] >>= \[jobFTS, activeJobOnlyFTS] -> newTVarIO HMS.empty >>= \companyStats -> connectCacheBackend LocalMemoryCache { cacheCfg=cfg , cacheLogger=maybeL , scJobFTS=jobFTS , scActiveJobOnlyFTS=activeJobOnlyFTS , scCompanyStats=companyStats -- ... more setters ... } logErrAndReturn :: CacheBackend -> String -> SomeException -> IO CacheBackend logErrAndReturn c msg err = logMsg c ERROR (msg <> ": " <> show err) >> return c getCacheEntry :: CacheKey -> TVar (HMS.HashMap CacheKey a) -> IO (Maybe a) getCacheEntry k tv = atomically (readTVar tv) >>= pure . HMS.lookup k -- ^ Insert an entry into a given shared concurrent hash map, returning the updated map insertCacheEntry :: CacheKey -> a -> TVar (HMS.HashMap CacheKey a) -> IO (HMS.HashMap CacheKey a) insertCacheEntry k newValue tv = atomically (readTVar tv) >>= pure . HMS.insert k newValue >>= \updatedMap -> atomically (writeTVar tv updatedMap) >> pure updatedMap checkCacheEntry :: CacheKey -> TVar (HMS.HashMap CacheKey a) -> IO Bool checkCacheEntry k tv = atomically (readTVar tv) >>= pure . HMS.member k removeCacheEntry :: CacheKey -> TVar (HMS.HashMap CacheKey a) -> IO () removeCacheEntry k tv = atomically (readTVar tv) >>= pure . HMS.delete k >>= atomically . writeTVar tv emptyCache :: TVar (HMS.HashMap CacheKey a) -> IO () emptyCache tv = atomically (writeTVar tv HMS.empty) timedCacheInvalidation :: (HasLogger c, Cache c) => CacheKey -> c -> Int -> IO () timedCacheInvalidation k c ms = void $ forkIO (delayedInvalidation k c ms) where delayedInvalidation k c ms = threadDelay ms >> try (invalidate k c) >>= logTimedCacheInvalidation k c logCacheHitOrMiss :: (HasLogger c, Cache c) => c -> CacheKey -> Maybe a -> IO (Maybe a) logCacheHitOrMiss c k Nothing = logMsg c DEBUG ("Cache miss for key "<>show k) >> return Nothing logCacheHitOrMiss c k res@(Just a) = logMsg c DEBUG ("Cache hit for key "<> show k) >> return res logCacheUpdate :: (HasLogger c, Show a) => c -> a -> IO () logCacheUpdate c = logMsg c DEBUG . ("Cache Updated: "<>) . show logTimedCacheInvalidation :: HasLogger c => CacheKey -> c -> Either SomeException () -> IO () logTimedCacheInvalidation k c (Left _) = logMsg c DEBUG ("Timed cache invalidation FAILED for key: " <> show k) logTimedCacheInvalidation k c (Right _) = logMsg c DEBUG ("Timed cache invalidation SUCCESS for key: "<> show k) class Cache c where getCacheLogger :: c -> Maybe Logger getCacheConfig :: c -> CacheConfig -- ^ Connect to the cache backend connectCacheBackend :: c -> IO (Either SomeException c) -- ^ Look up value(s) from the cache getJobListing :: CacheKey -> c -> IO (Maybe (PaginatedList JobWithCompany)) getCompanyStats :: CacheKey -> c -> IO (Maybe CompanyStats) -- ... more getters :( ... -- ^ Insert value(s) into the cache (will replace existing entries) insertJobListing :: CacheKey -> c -> PaginatedList JobWithCompany -> IO () insertCompanyStats :: CacheKey -> c -> CompanyStats -> IO () -- ... more inserters :( ... -- ^ Check whether a key has a value hasValue :: CacheKey -> c -> IO Bool -- ^ Invalidate a value already in the cache invalidate :: CacheKey -> c -> IO () -- ^ Invalidate caches invalidateAllJobListings :: c -> IO () invalidateCompanyListing :: c -> IO () -- ... slightly less but still more invalidators :( ... instance HasLogger CacheBackend where getComponentLogger = cacheLogger instance Cache CacheBackend where getCacheLogger = cacheLogger getCacheConfig = cacheCfg connectCacheBackend = return . Right -- This will have to chance once I have a non-local-memory type of CacheBackend getJobListing k@(JobFTS _) c = getCacheEntry k (scJobFTS c) >>= logCacheHitOrMiss c k getJobListing k@(ActiveJobOnlyFTS _) c = getCacheEntry k (scActiveJobOnlyFTS c) >>= logCacheHitOrMiss c k getCompanyStats k@(Stats _) c = getCacheEntry k (scCompanyStats c) >>= logCacheHitOrMiss c k getCompanyStats _ _ = throw InvalidCacheKey -- ... more pairs of getters :( ... insertJobListing k@(JobFTS _) c v = insertCacheEntry k v (scJobFTS c) >>= logCacheUpdate c insertJobListing k@(ActiveJobOnlyFTS _) c v = insertCacheEntry k v (scActiveJobOnlyFTS c) >>= logCacheUpdate c insertJobListing _ _ _ = throw InvalidCacheKey insertCompanyStats k@(Stats _) c v = insertCacheEntry k v (scCompanyStats c) >>= \res -> timedCacheInvalidation k c invalidationTimeMs >> logCacheUpdate c res where invalidationTimeMs = (cacheDefaultTimedInvalidationMs . getCacheConfig) c insertCompanyStats _ _ _ = throw InvalidCacheKey -- ... more pairs of inserters :( ... hasValue k@(JobFTS _) = checkCacheEntry k . scJobFTS hasValue k@(ActiveJobOnlyFTS _) = checkCacheEntry k . scActiveJobOnlyFTS hasValue k@(Stats _) = checkCacheEntry k . scCompanyStats -- ... more hasValue pattern completions ... invalidate k@(JobFTS _) = removeCacheEntry k . scJobFTS invalidate k@(ActiveJobOnlyFTS _) = removeCacheEntry k . scActiveJobOnlyFTS invalidate k@(Stats _) = removeCacheEntry k . scCompanyStats -- ... more invalidate pattern completions ... invalidateAllJobListings c = emptyCache (scJobFTS c) >> emptyCache (scActiveJobOnlyFTS c) invalidateCompanyListing c = emptyCache (scCompanyListing c)

Wrapping up

So it was pretty fun to implement all this, and I hope you learned from some of my mistakes at least and got a taste for what novice (hopefully amateur?) Haskell looks like! Feel free to drop me a line if you see a just absolutely horrid mistake/bad approach, I’d love to learn more and get to know what I’m doing wrong/right.