Stay in touch with EE Times India

EE Times-India > Embedded

Embedded

# Know why your code is buggy (Part 1)

Posted: 08 Aug 2013     Print Version

Keywords:debugging  software programming  program  scientific method

It is supposed to sort its command-line arguments, but some defect causes it to fail under certain circumstances such as:

\$ sample 11 14
Output: 0 11
\$ _

In Chapter 1 of this book we saw how to find the defect in the sample program—but in a rather ad hoc or unsystematic way. Let's now retell this debugging story using the concepts of scientific method.

Debugging sample—Preparation. We start with writing down the problem: what happened in the failing run and how it failed to meet our expectations. This easily fits within the scientific method scheme by setting up an initial hypothesis, "the program works," which is then rejected. This way,we have observed the failure, which is the first step in the scientific method.
• Hypothesis: The sample program works.
• Prediction: The output of sample 11 14 is "11 14."
• Experiment: We run sample as previously.
• Observation: The output of sample 11 14 is "0 11."
• Conclusion: The hypothesis is rejected.

Debugging sample—Hypothesis 1. We begin with a little verification step: Is the zero value reported by sample caused by a zero value in the program state? Looking at lines 38–41, it should be obvious that the first value printed (0) should be the value of a[0]. It is unlikely that this output code has a defect. Nonetheless, if it does we can spend hours and hours on the wrong trail. Therefore, we set up the hypothesis that a[0] is actually zero:
• Hypothesis: The execution causes a[0] to be zero.
• Prediction: a[0] = 0 should hold at line 37.
• Experiment: Using a debugger, observe a[0] at line 37.
• Observation: a[0] = 0 holds as predicted.
• Conclusion: The hypothesis is confirmed.

Debugging sample—Hypothesis 2. Now we must determine where the infection in a[0] comes from. We assume that shell_sort() causes the infection:
• Hypothesis: The infection does not take place until shell_sort().
• Prediction: The state should be sane at the beginning of shell_sort()—that is, a[] = [11, 14] and size = 2 should hold at line 6.
• Experiment: Observe a[] and size.
• Observation: We find that a[] = [11, 14, 0], size = 3 holds.
• Conclusion: The hypothesis is rejected. Debugging sample—Hypothesis 3. Assuming we have only one infection site, the infection does not take place within shell_sort(). Instead, shell_sort() gets bad arguments. We assume that these arguments cause the failure:
• Hypothesis: Invocation of shell_sort() with size = 3 causes the failure.
• Prediction: If we correct size manually, the run should be successful—the output should be "11 14."
• Experiment: Using a debugger, we:
1. Stop execution at shell_sort() (line 6).
2. Set size from 3 to 2.
3. Resume execution.
• Observation: As predicted.
• Conclusion: The hypothesis is confirmed.
Debugging sample—Hypothesis 4. The value of size can only come from the invocation of shell_sort() in line 36—that is, the argc argument. As argc is the size of the array plus 1, we change the invocation.
• Hypothesis: Invocation of shell_sort()with size = argc (instead of size = argc—1) causes the failure.
• Prediction: If we change argc to argc—1, the "Changing argc to argc _1" run should be successful. That is, the output should be "11 14."
• Experiment: In line 36, change argc to argc—1 and recompile.
• Observation: As predicted.
• Conclusion: The hypothesis is confirmed.

After four iterations of the scientific method, we have finally refined our hypothesis to a theory; the diagnosis "Invocation of shell_sort() with argc causes the failure." We have proven this by showing the two alternatives:
• With the invocation argc, the failure occurs.
• With the invocation argc—1, the failure no longer occurs.

Thus, we have shown that the invocation with argc caused the failure. As a side effect, we have generated a fix—namely, replacing argc with argc—1 in line 36. Note that we have not yet shown that the change induces correctness—that is, sample may still contain other defects.

In particular, in programs more complex than sample we would now have to validate that this fix does not introduce new problems. In the case of sample, though, you can do such a validation by referring to a higher authority: as the author, I claim that with the fix applied there is no way sample could ever sort incorrectly. Take my word for it.

Scientific debugging is explicit debugging
Earlier, we saw how to use the scientific method to establish the failure cause. You may have noticed that the process steps were quite explicit: We explicitly stated the hypotheses we were examining, and we explicitly set up experiments that supported or rejected the hypotheses.

1 • 2 • 3

 Related Articles Editor's Choice
Comment on "Know why your code is buggy (Part 1)"
Comments: *  You can enter [0] more charecters.

Top Ranked Articles

Webinars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Search EE Times India
Services

﻿