Hypothesis Driven Debugging

Debugging with the Scientific Method

With this post I will explain a method of debugging that’ll help you get to the bottom of your bugs — one that I’m calling Hypothesis Driven Debugging. I’d like to say I’m showing you something brand new — but really, I’m just showing you how to apply the scientific method to the art of debugging…or is it a science?.

This process is for after a bug has been reproduced. Strategies to quickly reproduce a reported bug deserve an entire article to themselves, but, in short, make sure you understand all sources of data available to you (server logs, database backups, analytics, user interviews, and breadcrumbs!) to help you reproduce. It took our team a long time to really understand how all of these sources can be used together to quickly reproduce nearly any bug reported to us.

So you can somewhat consistently reproduce the bug…now what?

You probably already cycle through the following steps of the scientific method as you debug: Observe, Question, Hypothesize, Predict, Test, Analyze Results. The individual steps may not be well defined in your head, and some short circuits in this flow may send you down a dead end path, or even lead you to the wrong conclusion. I’ve found that being aware of and documenting each step as I go through the debugging process help keep me on track and minimize logical flaws. What distinguishes Hypothesis Driven Debugging from regular debugging is the conscious awareness and documentation of each step in the scientific method as you debug.

I highly recommend documenting your progress as you debug. You don’t know how long the process is going to take — so its very helpful to see your steps written down to help you get back on track after emerging from a rabbit hole. Documenting your progress serves other purposes as well: 1) it prevents you from having to repeat yourself when you forget the results of a previous test and 2) it serves as documentation & proof as you present your findings to others or when you hand off your investigation. I usually create a Google Doc and start typing away with the information about the bug that I already know. It doesn’t need to be pretty, in fact, here is a raw unadulterated document that I used during an early iteration of Hypothesis Driven Debugging of the bug I elaborate on below.

Step 1 — Observation

This is what you’ve observed of the bug’s behavior so far. I’ll give a recent issue as an example to follow along with during the rest of the article.

Our bug: We have a Node.js server that generates PDFs for our users, using some background images we have stored in the cloud and some data that the user provides to us. This service is a core part of our application and has been operating without much problem for a couple years. Recently, a user reported that her PDF looked corrupted — most of the background image was there, but a part of it was missing. We were able to reproduce the issue by trying to generate the exact set of PDFs she was trying to generate, but we were unable to reproduce it every time. About 4 in every 10 generations would cause the problem. The problem didn’t occur on any other combination of PDFs that we tried.

Hypothesis Driven Debugging will have you cycling through the remaining steps until you finally hone in on the cause of the bug.

Step 2 — Question

Naturally, the observation of the bug leads to some questions. Why does the corruption only happen on a certain subset of PDFs? Why does the corruption only happen periodically? I had initially hoped to narrow the bug down to a certain subset of code by asking this question: Has the bug been around a long time, or only since our most recent major feature release?

You could really choose any question to form your hypothesis, but I like questions that if answered one way or the other, most narrows the possible sources.

Write your questions down.

Step 3 — Hypothesis

You’ll now use your knowledge of the code base, your knowledge of the application’s systems, your past experience, and any other resources to make an educated guess to answer the question you are asking. For the question I asked, my hypothesis was: The bug was recently introduced into the code base by the release of our most recent major feature, because if it had been around for any longer than that, the bug would have been reported much earlier.

Write your hypothesis down.

Step 4 — Prediction

Make a testable prediction based on your hypothesis. My prediction was: by using code prior to our most recent major release, the background images in the PDFs will not be corrupted.

Write your prediction down.

Step 5 — Test

Now you want to develop a test to see if your prediction is correct. Our system was easy enough to revert back for testing this particular service, so that’s exactly what I chose to do: check out code prior to the most recent feature, build and deploy that code to my local environment, and attempt to reproduce the bug with the same PDFs.

Write your test down.

Step 5a — Results

This really isn’t a separate “step” in the scientific method — but I like to include it in my notes document as a separate section for easy reference. This is a place for all the new information gathered during the experiment. During my experiment of testing with an older subset of code, I was able to reproduce the issue, but with significantly less consistency. Instead of 40% of the time, it was more like 5% of the time. I also noticed that we handled our documents different prior to our latest feature release — we had been unnecessarily downloading background images to our server that we did not intend on using for this specific user request along with those we did. Sometime during our latest release, the engineers developing this feature picked up on this inefficiency and made the appropriate changes.

Write your results down.

Step 6 — Analysis of Results

This is where you dive into the results, and this step is actually very similar to the initial observation stage when you likely unearthed a wealth of new information. Now its time to assess what the results mean, combine the new information with previous information, and come up with the most prevalent question that you’d like to form a hypothesis for — and et voilà the cycle repeats. The analysis led to a disproving of the hypothesis: the bug was not introduced with the latest feature release. Luckily, my results uncovered two key pieces of info that helped form the next key question. First, the bug is produced less consistently on old code. Second, we handled our image downloading differently on the old code.

Write your analysis down.

Now repeat steps 2–6 until done.

The first trip through immediately led to the question: Why does the method of downloading images affect the percentage of times the bug occurs? Which lead to the hypothesis: The PDFs are being generated prior to the images being downloaded fully — and more images to download decreases the likelihood that any individual image will not have downloaded prior to PDF generation. I’ll save you the full cycle and you could potentially follow along here if interested, though it is much more detailed than this article and it’s missing a lot of context if you’re not familiar with our application. Ultimately, we found that we were resolving our downloadImages promise by listening to a end event emitted from the read stream of the image, instead of listening to the event emitted from the pipe to the write stream. The resolved promise kicked off our PDF generation which would use the not-fully-downloaded images. The bug was introduced during a code refactor several major feature releases prior — however, it became much more prevalent after our most recent feature release, which is why it was reported.

That’s it.

I hope you found this article useful — I’m interested hearing your thoughts and your own debugging strategies. I’m confident that this particular issue would have been resolved much more quickly with Hypothesis Driven Debugging.

And btw, we are hiring. Also, here is a great article I found via google post write-up that has a similar title. The content is quite different, it’s interesting, and I recommend a read.

Thanks to Christopher Dzoba and Stephen Kelly for reviewing drafts of this post.