When developers talk about “test coverage” they are typically talking about how many lines of code are executed by their test suite. This is a simple calculation: what percentage of our code was run by our tests? We don’t want to accidentally break our code later so having strong test coverage is important.

Mutation testing is not an alternative to line coverage. While line coverage asks “what percentage of our code is run by our tests,” mutation testing asks “what code can I change without breaking your tests?” Mutation testing tools answer this question by applying and testing small modifications to your application.

This post explores how asking “what changes don’t break my tests?” can benefit more than just test coverage. Using a ruby mutation testing tool called mutest, I’ll introduce and reflect on two separate code examples to demonstrate how mutation testing helps you improve both your tests and your code itself.

Mutest keeps your tests honest

Consider this script for looking up users who tweeted ‘“I really enjoy #pizza”’:

require 'twitter' class Tweeters def recent query . first ( 3 ). map do | tweet | "@ #{ tweet . user . screen_name } " end end private def query api_client . search ( '"I really enjoy #pizza"' ) end def api_client Twitter :: REST :: Client . new do | config | config . consumer_key = ENV [ 'TWITTER_CONSUMER_KEY' ] config . consumer_secret = ENV [ 'TWITTER_CONSUMER_SECRET' ] config . access_token = ENV [ 'TWITTER_ACCESS_TOKEN' ] config . access_token_secret = ENV [ 'TWITTER_ACCESS_TOKEN_SECRET' ] end end end puts Tweeters . new . recent if __FILE__ == $0

To illustrate the difference between “line coverage” and “mutation coverage” consider this intentionally bad test:

require 'simplecov' SimpleCov . start require 'tweeters' require 'rspec' RSpec . describe Tweeters do it 'returns results' do expect ( Tweeters . new . recent ). to_not be ( nil ) end end

Now if I run this test:

$ rspec -I. -rpizza_spec.rb . Finished in 0.94429 seconds (files took 1.38 seconds to load) 1 example, 0 failures Coverage report generated for RSpec to /dev/coverage. 15 / 15 LOC (100.0%) covered.

My test passed with 100% coverage.

If I run this test again with mutest and instruct it to only mutate the recent instance method then I see the following summary:

Mutations: 36 Kills: 19 Coverage: 52.78% See full output

This tells me that my recent method actually has 52.78% mutation coverage! This means that mutest found 36 ways it could change my method and only 19 of those changes resulted in my test failing.

Mutest shows me what my tests missed. For example, here are three of the nineteen mutations my tests did not catch:

def recent - query.first(3).map do |tweet| - "@#{tweet.user.screen_name}" - end + self end def recent query.first(3).map do |tweet| - "@#{tweet.user.screen_name}" + nil end end def recent query.first(3).map do |tweet| - "@#{tweet.user.screen_name}" + "@#{tweet.user}" end end

Again, the test for this script was intentionally bad, but the difference in results is important. All my test did was assert that my recent method did not return null. This assertion did technically exercise 100% of the code though so the line coverage tool is reporting 100% coverage. Mutest quickly showed me that it could make my method return self , [nil, nil, nil] , and ['@#<Twitter::User:0x1>', '@#<Twitter::User:0x2>', '@#<Twitter::User:0x3>'] without breaking my tests.

The takeaway here is not that line coverage is bad. You can write good tests without mutest. Instead, think of mutation testing as an x-ray for your tests. Running mutest on new code can help you double check that your tests are covering everything you care about. Mutest can also be a powerful tool when conducting a code review. It is easy to see roughly which methods are tested, but it can be hard to spot what that original author might have overlooked.

Mutest helps you write more robust code

Imagine you are tasked with creating an endpoint in your company’s internal API which does two tasks:

Looking up users by their unique id

Returning a list of users which signed up after a certain date

A few hours later you write the following code

class UsersController < ApplicationController # Looks up GET param `user_id` and returns user # # @return [User] # # @api public def show render json: UserFinder . call ( params [ :user_id ]. to_i ) rescue UserFinder :: RecordNotFound => error render json: { error: error . to_s } end # Finds users created after date specified in GET param `after` # # @return [Array<User>] list of users # # @api public def created_after after = Date . parse ( params [ :after ]) render json: UserFinder :: Recent . call ( after ) end end

Along with this code you write some unit tests for the different edge cases you expect your controller to handle:

$ rspec --format documentation users_controller_spec.rb UsersController#show returns a user when given a valid id renders JSON error when given an invalid id UsersController#created_after returns multiple users given an early date excludes users created before date and includes users after renders empty array when date is in the future Finished in 0.00433 seconds ( files took 0.23881 seconds to load ) 5 examples, 0 failures

You deploy your new features and move on to your next task. Later, you find out that the front end team reported a bug in your API. Apparently every request they make returns

{ "error" : "Could not find User with 'id'=0" }

That same day you find out that the marketing team thinks your “new users” endpoint doesn’t work either. Apparently they sometimes get empty results when they shouldn’t. You end up spending the day debugging for your co-workers and eventually figure out what they were doing wrong.

To the front end developer you explain

The API expects the parameter user_id but you specified id . My code ends up getting nil when it tries to get the user_id parameter which is coerced 0 which explains why you always got that error.

moving on to the marketing team you explain

You need to write your dates in the format "YYYY-MM-DD" . The problem was when you were searching things like “last December” which ruby parses as December of this year.

What if we ran mutest on this code before shipping it? Running mutest on UsersController we see the following alive mutations:

def created_after - after = Date.parse(params[:after]) + after = Date.iso8601(params[:after]) render(json: UserFinder::Recent.call(after)) end def created_after - after = Date.parse(params[:after]) + after = Date.parse(params.fetch(:after)) render(json: UserFinder::Recent.call(after)) end def show - render(json: UserFinder.call(params[:user_id].to_i)) + render(json: UserFinder.call(Integer(params[:user_id]))) rescue UserFinder::RecordNotFound => error render(json: { error: error.to_s }) end def show - render(json: UserFinder.call(params[:user_id].to_i)) + render(json: UserFinder.call(params.fetch(:user_id).to_i)) rescue UserFinder::RecordNotFound => error render(json: { error: error.to_s }) end

Mutest is helping me reduce the side effects that my application will permit. These four mutations eliminate subtle bugs which produce misleading errors and incorrect output.

1. Requiring parameters with Hash#fetch

Date . parse ( params [ :after ]) → Date . parse ( params . fetch ( :after ))

In both actions before we used Hash#[] which implicitly returns nil if the specified key is not present. Hash#fetch on the other hand will raise an error if the specified key is not present. As a result, mutest makes me think about the use case where an implementer of the API does not provide an expected parameter.

2. Better type coercion with Kernel#Integer

params [ :user_id ]. to_i → Integer ( params [ :user_id ])

In UsersController#show we called #to_i on our user_id parameter. This ended up coercing nil into 0 which made our final error message more confusing. #to_i will do its best to coerce any input, but this is often not what we want:

nil . to_i # => 0 'hello' . to_i # => 0

Mutest replaces this with Kernel#Integer which is more strict:

Integer ( nil ) # => TypeError: can't convert nil into Integer Integer ( 'hello' ) # => ArgumentError: invalid value for Integer(): "hello"

Date . parse ( params [ :after ]) → Date . iso8601 ( params [ :after ])

In UsersController#created_after we called Date#parse which tries to parse any string it thinks could be a date. This sounds handy, but in practice it often can be a subtle source of bugs since all it really needs to see are two adjacent numbers or three letters which could be a month abbreviation:

# Seems useful! Date . parse ( 'May 1st 2015' ) # => #<Date: 2015-05-01> Date . parse ( '2015-05-01' ) # => #<Date: 2015-05-01> # Never mind Date . parse ( 'Maybe not a date' ) # => #<Date: 2015-05-01> Date . parse ( 'I am 10 years old' ) # => #<Date: 2015-10-10>

Ruby has many more specific date parsing methods. In this case mutest found that iso8601 still works with the tests cases we specified:

# Actually useful! Date . iso8601 ( '2015-05-01' ) # => #<Date: 2015-05-01> Date . iso8601 ( 'May 1st 2015' ) # => invalid date (ArgumentError) Date . iso8601 ( 'Maybe not a date' ) # => invalid date (ArgumentError) Date . iso8601 ( 'I am 10 years old' ) # => invalid date (ArgumentError)

Each mutation was better fit for the use case in question. The replacement methods were more likely to throw errors when given unexpected input. Knowing this during the development cycle causes me to handle these edge cases since I don’t want an exception to go uncaught and produce an application error. Even if I do forget to cover one of these use cases though the alternative is still preferable: an exception is thrown in production instead of weird behavior silently degrading my app’s quality for months. I know about the error the first time a user triggers it instead of the first time a user complains.

Add mutest to your workflow

Mutest is a powerful tool for improving your code. At Cognito we try to always check the mutation coverage before shipping code to production. You don’t have to aim for 100% coverage to benefit from tools like mutest. Simply running mutest against your codebase and seeing what it can change should help you better understand what tests you are missing and what code could be improved.