Shipping as a heartbeat

The ability to ship the new version of your application fast has many advantages. Building a new feature is enjoyable — you can split it in more pieces and make sure each works and scales properly. When you’re refactoring, you can refactor a really small piece and test it out. Oops, you made a mistake?! No problem — you can apply a fix and ship it fast. Putting your code into production frequently changes how you work. Every mistake has a shorter (and smaller) impact so fear of mistakes become smaller. The easier it is to fix a mistake, the more eager developers are to change the app. It’s very simple — like drawing with a pen versus a pencil and an eraser! Which one is more suitable for learning how to draw?

Shipping Intercom frontend app in ~40 minutes

The process we’ve established in Intercom looks like PR -> quick review -> test -> merge into master -> test -> deploy! Since it’s cheaper to write tests instead of clicking around the app after every single change, our app is getting more and more tests over time. More tests, longer test time. Longer test time, longer time from my computer to customers. In fact, test phase executes twice, once on feature branch and once on master branch to make sure master is healthy after the merge. That means that every new test increases the time to production by 2 x test_duration . At some point our tests were running for 15 minutes. When we double that, we get 30 minutes waiting time just for testing. On top of that, we have to allow for a few more minutes to create a build, compress assets, minify javascript and whatnot and it’s now taking 40 minutes for our change to hit the production! That’s really slow! If I make a mistake, I have to rollback instead of rolling forward and be really sure I get the fix right.

Test phase breakdown

I am working on our front-end app quite rarely so I know that codebase far less than the backend service, which means I make more mistakes there. Cost of every mistake I make is far higher (time-wise) and I was really annoyed by that. I took the ignorant attitude of “I am going to make that faster” and started digging into our test setup. It’s very common:

Prepare software dependencies (yarn, npm, bower, …)

Check out the code

Build the app

Run tests

I started timing different phases and figured out most phases are heavily cached and take less than a minute, but building the app took 4 minutes! 4 minutes?! When I change a file locally, I can see the change in a matter of few seconds. Why can’t I have the same build time in test environment?

Broccoli cache to the rescue!

Since I had no idea how ember builds work, I started digging into it (1). Turned out the component behind the ember incremental builds is Broccoli.js. It does some magic (it’s not actually magic, it’s a little bit of CS and software engineering) to rebuild just change files and files depending on them, which is usually a file or two and takes far less than four minutes. It also turned out there is environment variable BROCCOLI_PERSISTENT_FILTER_CACHE_ROOT defining where that cache is. Tada! Make sure that directory is shared between your builds so they can be incremental, as experienced in development environment. Many CIs support some sort of caching between builds.

For example, our CircleCI config looks like:

machine:

...

environment:

BROCCOLI_PERSISTENT_FILTER_CACHE_ROOT: "/home/ubuntu/embercom/persisted-cache"

...

dependencies:

cache_directories:

- "/home/ubuntu/embercom/persisted-cache"

Fortunately, CircleCI caches per-branch so cache can’t be used just when dependencies upgrade. In all other cases, it reduces our test time by three and a half minutes — seven minutes in total! Definitely not a bad result for a new guy investing 1 day of his time!

Further improvements

Our infrastructure team started playing with BuildKite (we’re loving it so far!) and has built Docker images in a way that only requires a rebuild of very few layers for a new commit. Very often just copying new files, which saves 2 more minutes on average. I believe they will write a blog post about that soon so stay tuned (I will keep you updated ;)).

Since BuildKite does not offer out of the box cache like CircleCI, we have to implement it ourselves. We use very simple implementation with S3. You can check out buildite-s3-cache (disclaimer: this is not what Intercom uses in production). Example usage:

export BUCKET_PATH="<bucket_name>/<path>"

source set_unique_cache_dir.sh

./download_cache.sh

./run_my_tests.sh ${CACHE_DIR}

./upload_cache.sh

and get all benefits of cached builds!

Have some other suggestions on improving ember tests runtime?! Tell me!

(1) Actually, “digging” started by me and Gavin Joyce drinking in our company bar.