Self-testing methods: Software failure
Keywords:electronic systems self-testing hardware Embedded software C language
As was mentioned in the introduction to Part 1, the acceptance of possible failure is a key requirement for building robust systems. This is extremely relevant when considering the possibility of software failure. Even when great care has been taken with the design, testing and debugging of code, it is almost inevitable that undiscovered bugs lurk in all but the most trivial code. Predicting a failure mode is tough, as this requires knowledge of the nature of the bug that leads to the failure and, if that knowledge were available, the bug would have been expunged during development.
The best approach is to recognise that there are broadly two types of software malfunction: data corruption and code looping. Some defensive code can be implemented to detect these problems before too much damage is done.
Data corruption
Arguably the most powerful feature of the C language is also the most common cause of errors and faults: pointers. Data is most likely to become corrupted if it is written via a pointer. The problem is there is no easy way to detect an invalid pointer. If the pointer is NULL, a dereference results in a trap, so ensuring that a suitable trap handler is installed is a start. A similar trap can handle the situation where an invalid (non-existent) memory address is presented by a pointer. However, if the address is valid, but incorrect, random errors may occur.
![]() |
A memory management unit (MMU) provides some options to trap erroneous situations, as it gives software control over what memory is considered to be valid at any given time. The classic use of an MMU is with a process model operating system. In this context, the code of each task can only access the memory specifically allocated to it. Any attempt to access outside of this area causes an error.
There are two special cases where there is a chance to detect pointer errors: stack overflow/underflow and array bound violation.
Stack space allocation is something of a black art. Although there are static analysis tools around that can help, careful testing during development is wise. This may involve filing the stack with a "fingerprint" value, and then looking at utilisation after some period of code execution, or write access breakpoints may be employed. Runtime checks for stack usage are often sensible. This simply requires the addition of "guard words" at either end of the allocated stack space. These are pre-loaded with a unique value, which can be recognised as being untouched. It is logical to use an odd number (as addresses are normally even) and avoid common values like 0, 1 and 0xffffffff. There is then a 4 billion to 1 chance of a false alarm. Like memory tests, the guard words can be checked from a background task or whenever the CPU has nothing better to do. Another possible way to monitor the guard words would be with an MMU that has a fine grain resolution, but such functionality is not common.
Related Articles | Editor's Choice |
Visit Asia Webinars to learn about the latest in technology and get practical design tips.