Global Sources
EE Times-India
Stay in touch with EE Times India
 
EE Times-India > Embedded
 
 
Embedded  

Perform systematic program debugging (Part 1)

Posted: 25 Jul 2013     Print Version  Bookmark and Share

Keywords:defect  code  bug  fault  failure 

The final three steps—from finding the infection origins to isolating the infection chain—are the steps concerned with understanding how the failure came to be.This task requires by far the most time, as well as other resources. Understanding how the failure came to be is what the rest of this article is about.

Why is understanding the failure so difficult? Considering figure 1, all one need do to find the defect is isolate the transition from a sane state [i.e., non-infected, as intended] to an infected state. This is a search in space (as we have to find out which part of the state is infected) as well as in time (as we have to find out when the infection takes place).

However, examination of space and time are enormous tasks for even the simplest programs. Each state consists of dozens, thousands, or even millions of variables. For example, figure 2 shows a visualisation of the program state of the GNU compiler [GCC] while compiling a program. The program state consists of about 44,000 individual variables, each with a distinct value, and about 42,000 references between variables.

Figure 2: A program state of the GNU compiler. The state consists of 44,000 individual variables (shown as vertices) and about 42,000 references between variables (shown as edges).

Not only is a single state quite large, a program execution consists of thousands, millions, or even billions of such states. Space and time thus form a wide area in which only two points are well known (figure 3): initially, the entire state is sane [√], and eventually some part of the state is infected [x].

Figure 3: Debugging as search in space and time. Initially, the program state is sane [√], eventually, it is infected [x]. The aim of debugging is to find out where this infection originated.

Within the area spanned by space and time, the aim of debugging is to locate the defect—a single transition from sane to infected that eventually causes the failure (figure 4).

Figure 4: The defect that is searched. A defect manifests itself as a transition from sane state [√] to infected state [x], where an erroneous statement causes the initial infection.

Debugging as a search problem
Thinking about the dimensions of space and time, this may seem like searching for a needle in an endless row of haystacks—and indeed, the fact is that debugging is largely a search problem. This search is driven by the following two major principles:
1. Separate sane from infected. If a state is infected, it may be part of the infection propagating from defect to failure. If a state is sane, there is no infection to propagate.
2. Separate relevant from irrelevant. A variable value is the result of a limited number of earlier variable values. Thus, only some part of the earlier state may be relevant to the failure.

Figure 5 illustrates this latter technique. The failure, to reiterate, can only have been caused by a small number of other variables in earlier states [denoted using the exclamation point, !], the values of which in turn can only have come from other earlier variables. One says that subsequent, variable values depend on earlier values.

This results in a series of dependencies from the failure back to earlier variable values. To locate the defect, it suffices to examine these values only—as other values could not have possibly caused the failure—and separate these values into sane and infected.

Figure 5: Deducing value origins. By analysing the program code, we can find out that an infected variable value [x] can have originated only from a small number of earlier variables [!].

If we find an infected value, we must find and fix the defect that causes it. Typically, this is the same defect that causes the original failure.

Why is it that a variable value can be caused only by a small number of earlier variables? Good programming style dictates division of the state into units such that the information flow between these units is minimised.

Typically, your programming language provides a means of structuring the state, just as it helps you to structure the program code. However, whether you divide the state into functions, modules, objects, packages, or components, the principle is the same: a divided state is much easier to conquer.

This article was excerpted from Andreas Zeller's book Why programs fail: A guide to systematic debugging (Second Edition, Copyright 2009), published by Morgan Kauffmann, an imprint of Elsevier Inc.

References
1. Humphrey, W.S. [1999], "Bugs or defects?" Technical Report, Volume 2, Issue 1, Carnegie Mellon Software Engineering Institute.
2. Dijkstra, E.W. [1982], "On Webster, Users, Bugs, and Aristotle," in Selected Writings on Computing: A Personal Perspective," Springer-Verlag.

About the author
Andreas Zeller is chair of software engineering at the University of Saarland where his research involves programmer productivity with a particular interest in finding and fixing problems in code and development processes. He is best known for the visual GNU DDD debugger and the delta debugging technique for automatically isolating failure causes in program code.

To download the PDF version of this article, click here.


 First Page Previous Page 1 • 2 • 3



Comment on "Perform systematic program debugging..."
Comments:  
*  You can enter [0] more charecters.
*Verify code:
 
 
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

 

Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

 
Back to Top