Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Power/Alternative Energy
Power/Alternative Energy  

Optimising software for power efficiency (Part 2)

Posted: 23 Jun 2014     Print Version  Bookmark and Share

Keywords:Data flow  Hardware optimisation  DSP  Power gating  MSC815x 

The MJPEG code in this case counts the number of cycles a core spends doing actual work (handling an incoming Ethernet interrupt, dequeueing data, encoding a block of data into JPEG format, enqueueing/sending data back over Ethernet).

The number of core cycles required to process a single block encode of data (and supporting background data movement) is measured to be of the order of 13,000 cycles. For a full JPEG image (B396 image blocks and Ethernet packets), this is approximately 5 million cycles.

So 1 JPEG frame a second would work out to be 0.5% of a core's potential processing power, assuming a 1GHz core that is handling all Ethernet I/O, interrupt context switches, etc.

In this example the DSP has up to six cores, and only one core would have to manage Ethernet I/O; in a full multi-core system, utilisation per core drops to a range of 3 to 7%. A master core acts as the manager of the system, managing both Ethernet I/O, intercore communication, and JPEG encoding, while the other slave cores are programmed to solely focus on encoding JPEG frames. Because of this intercore communication and management, the drop in cycle consumption from one core to four or six is not linear.

Based on cycle counts from the OCE, we can run a single core, which is put in a sleep state for 85% of the time, or a multi-core system which uses sleep state up to 95% of the time.

This application also uses only a portion of the SoC peripherals (Ethernet, JTAG, a single DDR, and M3 memory). So we can save power by gating the full HSSI System (Serial Rapid IO, PCI Express), the MAPLE accelerator, and the second DDR controller. Additionally, for our GUI demo, we are only showing four cores, so we can gate cores 4 and 5 without affecting this demo as well.

Based on the above, and what we have discussed in this section, here is the plan we want to follow:

At application start up:
 • Clock gate the unused MAPLE accelerator block (MAPLE described later in this chapter).
 • MAPLE power pins share a power supply with core voltage. If the power supply to MAPLE was not shared, we could completely gate power. Due to shared pins on the development board, the most effective choice we have is to gate the MAPLE clock.
 • MAPLE automatically goes into a doze state, which gates part of the clocks to the block when it is not in use. Because of this, power savings from entirely gating MAPLE may not be massive.
 • Clock gate the unused HSSI (high-speed serial interface).

 • We could also put MAPLE into a doze state, but this gates only part of the clocks. Since we will not be using any portion of these peripherals, complete clock gating is more power efficient.
 • Clock gate the unused second DDR controller. NOTES:
 • When using VTB, the OS places buffer space for VTB in the second DDR memory, so we need to be sure that this is not needed.

During application runtime:

 • At runtime, QE (Ethernet Controller), DDR, interconnect, and cores 1—4 will be active. Things we must consider for these components include:
 • The Ethernet Controller cannot be shut down or put into a low power state—as
 • this is the block that receives new packets (JPEG blocks) to encode. Interrupts from the Ethernet controller can be used to wake our master core from low-power mode. Active core low-power modes:
 • WAIT mode enables core power savings, while allowing the core to be
 • woken up in just a few cycles by using a disabled interrupt to signal exit from WAIT.
 • STOP mode enables greater core savings by shutting down more of the sub-system than WAIT (including M2), but requires slightly more time to wake due to more hardware being re-enabled. If data is coming in at high rates,
 • and the wake time is too long, we could get an overflow condition, where packets are lost. This is unlikely here due to the required data rate of the application.
 • The first DDR contains sections of program code and data, including parts of the Ethernet handling code. (This can be quickly checked and verified by looking at the program's .map file.) Because the Ethernet controller will be waking the master core from WAIT state, and the first thing the core will need to do out of this state is to run the Ethernet handler, we will not put DDR0 to sleep.

We can use the main background routine for the application to apply these changes without interfering with the RTOS. This code segment is shown in figure 7 with power-down- related code.

Note that the clock gating must be done by only one core as these registers are system level and access is shared by all cores.

 First Page Previous Page 1 • 2 • 3 • 4 • 5 Next Page Last Page

Comment on "Optimising software for power effici..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top