Red Squirrel Reflections
Dave Hoover explores the psychology of software development
[Previous entry: "Do No Harm"] [Main Index] [Next entry: "Mind Your Metaphor"]
Assume it is Your Fault
Thursday, October 9, 2003
"Even if an error at first appears not to be your fault, it's strongly in your interest to assume it is. That assumption helps you debug: It's hard enough to find an error in your code when you're looking for it; it's even harder when you've assumed your code is error free." --Code Complete, p. 628, Steve McConnell
Over the last 24 hours I internalized this idea, hopefully once and for all. One of our web applications was experiencing some very odd and hard to reproduce errors. These errors were just brought to our attention yesterday, though it is thought that they have been happening for the last few months, quite possibly driving traffic (and revenue) away from our site.
The errors only happened after user authentication, and only for a specific category of user. Furthermore, the errors were only happening on one application, though it used the same authentication module as other applications. The frequency of the errors increased during periods of higher network traffic.
My first reaction yesterday afternoon was to blame it on memory and network issues, and was at a total loss about how any application-specific code could be causing this. The problem was particularly difficult to diagnose because of the vague and misleading messages that were written to the log.
While the network and system administrators were checking and re-checking their pieces of the pie, I was drawing a blank. As I walked home, though, I reminded myself of the above quote from Code Complete, and decided that I must begin assuming that the problem was my fault.
Arriving to work this morning, my newfound assumption led me to review the various logical paths of authentication. I started walking through the Java code, transcribing the logical paths into plain English in a spreadsheet. I didn't find anything.
Then I looked to see if any of the authentication module's classes had been extended in the troublesome application. It turned out that one of the classes had indeed been extended. And wouldn't you know it, I was the last person to check that class into version control. I took a look at the class, diff'ing it with the previous version to see what I had done. I had made some modifications to it, and had cleaned up a particularly nasty looking method.
I took another look at that method. I took another look at the older, nasty version. Although my code was nicely indented with English variable names, I had inadvertantly changed the method's logic. I had committed a foolish, foolish mistake. I had refactored without tests.
So the problem did turn out to be my fault. I can't imagine how much longer it would have taken to diagnose if I had assumed otherwise.
Replies: 1 Comment
>Although my code was nicely indented with English variable names, I had inadvertantly changed the method's logic.
This resonates with your (later, but I read it first) post about accuracy and precsion. Making the code clean is analagous to reporting a more precise measurement - it looks better. But if it's not as correct (not as accurate), it's not an improvement.
Posted by Carl Manaster on 10/30/2003
Powered by Greymatter