Debugging’s a critical skill that every software engineer needs to learn. However, it’s something that’s rarely taught. For many engineers, you kind of just have to figure it out (often through trial and error).

For me, this meant taking a spray and pray approach. I’d start with the most recent code I’d written or looked at and tweak some lines, then run my code. It didn’t work, so I’d pick a different method, change some stuff there, then run my code again. I kept doing this until I either found the bug or gave up for the day. This was pretty ineffective.

Over the past several years, I’ve gotten a lot better at debugging. I’ve debugged a lot more myself, watched my coworkers debug, and even given technical interviews that focus on debugging. Based on this I’ve identified a few things that really help in debugging more effectively.

Debugging Principles

Define Correct Behavior

In order to fix a bug, you first have to know what it means to be fixed!

This is normally a two part process: confirming that something isn’t working and knowing what it’s supposed to look like. At a high level, most people are pretty good at this. They’ll try to reproduce errors and they’ll have an example (or at least a description) of what functionality was supposed to happen.

Debugging is a recursive process–you’re continually hunting for lower level bugs until you find the actual issue. The place where people get in trouble is that they forget to keep defining correct behavior as they dig deeper.

I made this mistake a lot in the past. I’d identify that a part of my code was wrong and add a bunch of print statements to see the values of all my variables. Then I’d assume things were working if I saw anything get printed at all. Had I defined correct behavior properly, I would have known exactly what (and how many!) results I expected to be printed out.

Follow The Clues

There will always be some clues for you to investigate as you track down your bug. Use them!

In order to follow a clue, you first have to know it’s there. Some clues are obvious–you have an exception and a stack trace in your logs. These literally tell you where you should start your search.

Other clues are less obvious. That’s where defining correct behavior comes in. Anything that isn’t correct behavior is a clue. If you know your method ran 5 times but you only saw your debug message printed 4 times, that’s a clue. If you know that your string has length 4 but you don’t actually see anything, that’s a clue.

Once you’ve found a clue, you follow it by asking why that behavior is happening. If the clue isn’t self-explanatory, come up with some guesses as to what could be going on. Then check whether or not your guesses are correct.

Question Your Assumptions

Finally, always remember that you probably don’t fully understand the code. This is by definition–if you understood the code, you would already know what’s causing your bug!

The biggest indicator that you’ve fooled yourself into believing that you understand the code is when you find yourself stuck and thinking, “This should work because I know this piece of code does X.”

The issue is usually that that code doesn’t do X. It’s just that you believe it does X.

If you find yourself in this situation, you should stop and confirm what the code is actually doing. This will usually reveal something you overlooked, and you’ll be able to look more into that.

To be explicit, you should either use a debugger or some debug/print statements to see exactly what’s going on at your problem point. Don’t be afraid to get your hands dirty!

If you get really stumped, remember that your method doesn’t run in a vacuum! There might be something you don’t understand about the system architecture, things might be configured incorrectly, or your code might be reading faulty data.

The Approach

With these things in mind, how do you actually debug something? When your program runs, it’s executing your code one line at a time. Debugging involves finding the first line that ran incorrectly.

This might ring a bell–we have an ordered list of lines of code and want to find the first one that didn’t work the way we wanted it to. This is pretty similar to binary search!

Our basic approach is then as follows:

We define a starting point in our program where we know everything is still working correctly. We also have an end point where we know our bug has already occurred. From there, we guess that there’s a bug at some point in between our start and end point. We investigate whether or not our code is still functioning correctly at that point. If yes, we can repeat with the latter portion of the program. If no, we repeat with the earlier portion of the program. This continues until we find the buggy line. (If you ever get to a point where you think you’ve gone through every line of code, it might be that you originally defined the wrong starting point!)

How quickly you find your bug boils down to how well you can narrow down the code that you need to go through. Naturally, the more experience you have, the better you’ll be able to do this.

Regardless, keep in mind the concepts above–find clues, follow them, and make sure you understand both what your code should do and what it is actually doing!