Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Embedded

Resolve concurrency errors in multi-core design

Posted: 22 Mar 2012     Print Version  Bookmark and Share

Keywords:multi-core processors  interleavings  concurrency 

Having reached the limits of performance gains that can be realised from miniaturisation and integration, microprocessor manufacturers are increasingly choosing multi-core processors as the best approach to enhancing performance in computationally intensive embedded systems.

However, realising the potential performance gains with multi-cores has a cost. Software written to be single-threaded for a single core processor will realise little or no performance benefit when executed on a multi-core processor. To take advantage of the cores it must be rewritten or adapted to use multithreading.

The key challenge is to keep the cores busy as much as possible, while ensuring that they coordinate access to shared resources properly. Unfortunately writing such code is much harder than writing single-threaded code. Concurrent code is vulnerable to entirely new kinds of programming defects, such as deadlocks or race conditions, and when these occur, they can manifest in ways that are difficult to diagnose. Traditional techniques for finding and eliminating concurrency errors may be ineffective.

A principal reason concurrency bugs are so difficult is because there is an enormous number of ways in which the events in the threads can be interleaved when those threads execute. As the number of threads or instructions increases, the number of interleavings increases exponentially. If thread A executes M instructions and thread B executes N instructions, there are N+MCN possible interleavings of the two threads. For example, given two trivial threads with ten instructions each, there are 184,756 possible interleavings of those instructions. Even with very small programs it is clear that it is next to impossible to test all combinations. Secondly, even if it is feasible to identify a single interleaving that leads to a failure, it can be very difficult to set up a repeatable test case that uses that particular interleaving because scheduling of threads is effectively nondeterministic. Consequently debugging concurrent programs can be very expensive and time consuming.

Insidious race conditions
Of all the concurrency defects that can occur, race conditions are probably the most pernicious. Race conditions have been responsible for numerous system failures over the years. The famous case of the Therac-25 radiation therapy machine featured a race condition that was responsible for the deaths of several patients [2]. Similarly, the 2003 Northeast blackout was exacerbated by a race condition that resulted in misleading information being communicated to the technicians [3].

There are several different kinds of race conditions. One of the most common and insidious forms—data races—is the class of race conditions involving concurrent access to memory locations. A data race occurs when there are two or more threads of execution that access a shared memory location, at least one thread is changing the data at that location, and there is no explicit mechanism for coordinating access. If a data race occurs it can leave the program in an inconsistent state.

Figure 1: Race condition leads to incorrect count of items on an assembly line.

An example of a data race is shown in figure 1. A manufacturing assembly line has entry and exit sensors, and maintains a running count of the items currently on the line. This count is incremented every time an item enters the line, and decremented every time an item reaches the end of the line and exits. If an item enters the line at the same time that another item exits, the count should be incremented and then decremented (or vice-versa) for a net change of zero. However, normal increment and decrement are not atomic operations: they are composed of a sequence of individual instructions that first load the value from memory, then modify it locally, and finally store it back in memory. If the updating transactions are processed in a multithreaded system without sufficient safeguards, a race condition can arise because the sensors read and write a shared piece of data: the count. The interleaving in figure 1 results in an incorrect count of 69. There are also interleavings that can result in an incorrect count of 71, as well as a number that correctly result in a count of 70.

1 • 2 • 3 Next Page Last Page

Comment on "Resolve concurrency errors in multi-..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top