Have you ever wondered why some people seem to have no trouble finding bugs, while others sometimes have a hard time? I certainly did, because every minute I spend on hunting bugs is a wasted minute. So I’ve watched people debug, tried to learn from both failures and successes, and have compiled my observations here. This is a strategy guide to debugging. It’s not focused on particular languages or tools - in fact, many of the ideas in here can be applied to problem solving in general.

Know the landscape

Since we’re talking strategy, let’s stick with a “war” analogy for a moment. As in any battle, knowing the landscape is key. If you don’t understand the environment you are in, you might miss clues, misinterpret others and put way more effort into finding the problem than necessary. This means

  • Know the general ideas and data paths of the debugee
  • Know your tools
  • Know the environment. OS, hardware, etc.

Make it reproducible

Once you have a general understanding, you are prepared to actually tackle the bug. And the first step to tackling it is making it reproducible. Make it at least semi-consistently reproducible. If you cannot reproduce it easily, it is very hard to track down any problems. And before you fix it, always create a test that reproduces it 100% - otherwise, you’ll never be certain you fixed it.

Keep a log

Once you can reproduce it, write it down. You will forget it otherwise. Believe me, I’ve been there. In general, write down everything relevant so you can recall it easily. Some people keep it in their development log. For me, the best approach has always been putting all known facts on a whiteboard. This way, team mates can often offer insights.

Collect data

Now that you can make the problem happen, it is tempting to speculate what the problem is and just fix the code that is “obviously wrong”. Don’t give in to that temptation - most likely, it’s something completely different.

Any good theory is based on observations, and at this point, the observing has been slim - you only saw a manifestation of the bug. You don’t know if it’s the only manifestation. You don’t know if there’s more than one cause. You don’t know jack.

That’s fine, though. Now is the time to collect data. It’s nothing else than the scientific process - you collect data, form a theory, create testable hypotheses. Lather, rinse, repeat until your theory matches reality.

The collection of data can take many forms. Step through the code, add instrumentation to the code, use any kind of analysis tools available to you. Keep in mind that instrumentation quite possibly might change behavior - so be sure to test your hypotheses with the instrumentation removed, too. Heisenberg is alive and kicking.

No fear

Don’t let emotions override the process. Yes, sometimes the pressure is insane - you’ve got three hours before burning a disc, and you just found a crash bug. That’s OK - as long as you keep your head, you will find it. Just calmly collect data, and examine every hypothesis against the known data. Not only will it help you get there faster - it will also calm down your teammates, help them keeping their head.

Reduce the noise

“Hold on”, you say. “My code base is way too big to step through it!”. Yes, it is. It always is. That’s where you do like the Romans did: Divide & Conquer. Split your code into smaller and smaller pieces, excluding the parts that are not contributing to the problem.

One of the best ways to do that is find a piece of code where the problem shows itself, and then follow the bad data up the call chain. That usually gives you a good idea of what systems are involved. (Here’s where knowing the road map is important!). As much as possible, shunt things that are not involved.

Another way to reduce the clutter is to reduce the amount of data - fix inputs not contributing to the actual problem. Force them to always be the same value.

If you have a “known good” version, it becomes even easier - look at the changes since then. If your build pipeline allows it, do a binary search through all the changes until you’ve narrowed it down to a single problem. (Know your tools!)

So now you’ve narrowed it down enough to build a theory that matches your observations. You’re ready to make the fix. You have a test case that reproduces the problem every time.

How do hedgehogs make love? - Very carefully!

Step back for a second, take a deep breath, and repeat after me: “One change at a time!”. Go step by step. See if the change yields results that fit your theory. If you change multiple things, tracking interactions between them is incredibly hard.

If you still can’t find it

Of course, sometimes things aren’t that easy. There are two pieces of advice for the “I can’t find it!” bugs.

  1. Double-check your assumptions. If the monitor is dark, and you know it’s plugged in, do yourself a favor - see if it is really plugged in.
  2. Find a debugging buddy. Explain the symptoms - often, that’s enough to smack your forehead. Either way, don’t share your theories, since they come with built-in assumptions and might mislead.
  3. Happy hunting

    That’s it. That is all there is to successfully debugging even the toughest problems. Yes, special problems require special techniques. But this is the general strategy. I’ve been following it for quite some time, and it seems to work for me. If you try it, let me know how it works for you.

Leave a reply