Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Embedded

Engineering embedded software: More C techniques

Posted: 24 Apr 2014     Print Version  Bookmark and Share

Keywords:loop transformations  C  Pragmas  loop transformations  buffer 

Aside from the basic C coding techniques for optimising embedded software for performance outlined in Part 1, there are many other techniques that embedded developers can use, such as pragmas, hardware and software loops, loop transformations and unrolling, multi-sampling partial summation, and software pipelining.

Pragmas can be used to communicate to the compiler information about loop bounds to help loop optimisation. If the loop minimum and maximum are known, for example, the compiler may be able to make more aggressive optimisations.

In the example in figure 9, a pragma is used to specify the loop count bounds to the compiler. In this syntax, the parameters are minimum, maximum and multiple respectively. If a non-zero minimum is specified, the compiler can avoid generation of costly zero- iteration checking code. The compiler can use the maximum and multiple parameters to know how many times to unroll the loop if possible.

Figure 9: A pragma used to specify the loop count.

Hardware loops. These are mechanisms built into some embedded cores which allow zero- overhead (in most cases) looping by keeping the loop body in a buffer or prefetching. Hardware loops are faster than normal software loops (decrement counter and branch) because they have less change-of-flow overhead. Hardware loops typically use loop registers that start with a count equal to the number of iterations of the loop, decrease by 1 each iteration (step size of 21), and finish when the loop counter is zero (figure 10).

Figure 10: Hardware loop counting in embedded processors.

Compilers most often automatically generate hardware loops from C even if the loop counter or loop structure is complex. However, there will be certain criteria under which the compiler will be able to generate a hardware loop (which vary depending on compiler/ architecture). In some cases, the loop structure will prohibit generation but if the programmer knows about this, the source can be modified so the compiler can generate the loop using hardware loop functionality.

The compiler may have a feature to tell the programmer if a hardware loop was not generated (compiler feedback). Alternatively, the programmer should check the generated code to ensure hardware loops are being generated for critical code. As an example the StarCore DSP architecture supports four hardware loops. Note the LOOPSTART and LOOPEND markings, which are assembler directives marking the start and end of the loop body, respectively (figure 11).

Figure 11: LOOPSTART and LOOPEND markings.

Additional tips and tricks
The following are some additional tips and tricks to use for further code optimisation:

Memory contention. When data is placed in memory, be aware of how the data is accessed. Depending on the memory type, if two buses issue data transactions in a region/bank/etc., they could conflict and cause a penalty. Data should be separated appropriately to avoid this contention. The scenarios that cause contention are device-dependent because memory bank configuration and interleaving differs from device to device.

1 • 2 • 3 • 4 Next Page Last Page

Comment on "Engineering embedded software: More ..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top