Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Embedded

Parallelism, concurrency for multi-core computing

Posted: 08 May 2014     Print Version  Bookmark and Share

Keywords:parallelism  concurrency  GPU  preemption  picothreads 

In a similar fashion prioritized scheduling can be accommodated, by creating separate server processes for each real-time priority. Each has its own dedicated stack, with a lower-priority server process running on a core only when all higher-priority server processes on the core have nothing to do. An alternative, when the real-time requirements are softer, is to use only one server process per core, but with separate deques for different priorities. With this approach, preemption of a running picothread does not occur. However, when the server process chooses a new picothread to execute, priorities would be obeyed: the server would select first from its own highest priority non-empty deque, but steal from another server if the latter had a non-empty deque of higher priority than any of the server's own non-empty deques.

Programming language constructs for concurrency and parallelism
Many programming languages incorporate some notion of concurrent threads, mutual-exclusion locks, and synchronising signals and waiting. Per Brinch Hansen's Concurrent Pascal [1] was one of the first languages to incorporate many of these concepts into the language itself. Ada and Java also included concurrent programming concepts from their inception, and many other languages now include these concepts as well. Generally the execution of a concurrent thread corresponds to the asynchronous execution of something approximating a named function or procedure. In Ada, this is the task body for the associated task. In Java, it is the run method of the associated Runnable object. Locks are often associated with some sort of synchronising object (often called a monitor), where some or all operations on the object automatically acquire a lock on starting the operation and automatically release the lock on completion, thereby ensuring that locks and unlocks are always balanced. In Ada, these are called protected objects and operations, while in Java they are the synchronised methods of a class.

Signalling and waiting are used to handle cases where concurrent threads need to communicate or otherwise cooperate, and one thread must wait for one or more other threads to take some action, or some external event to occur, before it can proceed further. Signalling and waiting is also often mediated by a synchronising object, with a thread awaiting some change in the state of the object, and a signal being used to indicate that the state has changed and some number of waiting threads should recheck to see whether the synchronising object is now in the desired state. Conditional critical regions suggested by Hoare and Brinch Hansen represented one of the first language constructs providing this kind of waiting and signalling implicitly based on a Boolean expression. More commonly this is provided by explicit Wait and Signal operations on an object or a condition queue (in Java signalling uses notify or notifyAll). Ada combines the notions of conditional critical regions and monitors by incorporating entries with entry barriers into the protected object construct, eliminating the need for explicit Signal and Wait calls. All of these notions represent what we mean by concurrent programming constructs.

By contrast, a smaller number of languages thus far incorporate what could be considered parallel programming constructs, though that is changing rapidly. As with concurrent programming, parallel programming can be supported by explicit language extensions, standard libraries, or some mixture of these two. A third option with parallel programming is the use of program annotations, such as pragmas, providing direction to the compiler to allow it to automatically parallelise an originally sequential algorithm.

One characteristic that distinguishes parallel programming is that the unit of parallel computation can often be less than the execution of an entire function or procedure, but instead might represent one or more iterations of a loop, or the evaluation of one part of a larger expression. Furthermore, the compiler and the underlying run-time system are more involved in determining what portions of the code can actually run in parallel. This is quite different from traditional concurrent programming constructs, which rely on explicit programmer decisions to determine where the thread boundaries lie.

One of the first widely used languages with general purpose parallel programming constructs was Cilk, designed by Charles Leiserson [3] at MIT, and now supported by Intel as part of their Intel Parallel Studio. Cilk allows the programmer to insert directives such as cilk_spawn and cilk_sync at strategic points in an algorithm, with _spawn causing the evaluation of an expression to be forked off into a separate lightweight thread, and _sync causing the program to wait for locally spawned parallel threads, so the result of their execution can be used. Furthermore, Cilk provides the ability to use cilk_for rather than simply for to indicate that the iterations of the given for-loop are candidates for parallel execution. Other languages now providing similar capabilities include OpenMP, which uses pragmas rather than language extensions to direct the insertion of parallel execution, the language Go from Google, which includes lightweight goroutines for parallel execution with channels for communication, the language Rust from Mozilla Research, which supports large numbers of lightweight tasks communicating using ownership transfer to avoid race conditions, and the language ParaSail from AdaCore using safe automatic parallelisation based on a pointer-free, alias-free approach that simplifies divide-and-conquer algorithms.

All of these parallel languages or extensions have adopted some variant of work stealing for the scheduling of their lightweight threads. And all of these languages make it easier to move from a sequentially-oriented mindset to a parallel-oriented one. Embedded and mobile programmers should begin experimenting with these languages now, to be prepared as real-time prioritized capabilities are merged with work-stealing schedulers, to provide the combination of reactivity and throughput needed for the advanced embedded and mobile applications on the drawing boards for the near future.

Further reading
1. P. Brinch Hansen (editor), The Origin of Concurrent Programming: From Semaphores to Remote Procedure Calls, Springer, June 2002.
2. R. D. Blumofe and C. E. Leiserson, Scheduling Multithreaded Computations by Work Stealing, Journal of the ACM, 720–748, September, 1999.
3. C. Maia, L. Nogueira, L. M. Pinho, Scheduling parallel real-time tasks using a fixed-priority work-stealing algorithm on multiprocessors, 8th IEEE Symposium on Industrial Embedded Systems (SIES), June 2013
4. S. T. Taft, Systems Programming with Go, Rust, and ParaSail

About the author
S. Tucker Taft is VP and Director of Language Research at AdaCore. He joined AdaCore in 2011 as part of a merger with SofCheck, which he had founded in 2002 to develop advanced static analysis technology. Prior to that he was a Chief Scientist at Intermetrics, Inc. and its follow-ons for 22 years, where in 1990-1995 he led the design of Ada 95. He is recipient of an A.B. Summa Cum Laude degree from Harvard University, where he has more recently taught compiler construction and programming language design.

To download the PDF version of this article, click here.

 First Page Previous Page 1 • 2 • 3

Comment on "Parallelism, concurrency for multi-c..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top