Global Sources
EE Times-India
EE Times-India > EDA/IP

How to make ESL power optimisation a reality

Posted: 08 Feb 2013     Print Version  Bookmark and Share

Keywords:power optimisation  high-level synthesis  RTL 

Low power is the main concern of digital design, especially for handheld and wireless devices. The same is true for servers and other computation intensive applications where the cost of cooling and packaging can be quite high. As a consequence, power optimisation is an essential factor in meeting and improving quality of results as well as for optimising performance and area.

Thus far, power optimisation efforts have centred on RTL models and gate-level netlists, which are not sufficient for achieving optimal power savings. Optimising for power should occur at all levels of design—from architecture to board. It is at the architecture, or electronic system level (ESL), where the potential for power savings are the greatest. Indeed, the opportunities for optimising low power are significantly higher at the architectural level of abstraction—with as much as a 10X improvement over gate level optimisations. Yet, ironically, this is where low power methodologies and tools are the weakest. This deficiency drives the need for tools that not only allows designers to explore the best architecture for power at a higher level of abstraction but also automatically implements lower level transformations, like sequential clock gating, in the RTL produced.

The answer is found in the integration of high-level synthesis (HLS) and power analysis to create a new HLS product capable of optimising across three dimensions—power, performance and area (PPA). HLS allows designers to synthesise different RTL architectures from C++ or SystemC electronic system level (ESL) models. The different hardware architectures are generated through user constraints which specify such things as clock period, resource limitations, IO protocol and the level of desired concurrency. Such a low power HLS solution can implement a generous range of low power techniques into synthesised RTL; including bit-width optimisation, multiple clock domain partitioning, memory access minimisation, resource sharing, frequency exploration, power gating, and clock gating.

In this article, we will discuss, in general, the ESL to RTL low power design flow, and then share the results of two case studies using real customer designs to evaluate the efficacy of a unique solution for ESL synthesis and power architecting.

Seven basics of architectural power exploration
There are seven basic concepts that designers should focus on when looking for ways to save power, while satisfying performance goals, during architectural exploration.

1. Numerical refinement: The first design step for controlling power is numerical refinement. Algorithmic C bit-accurate data types support arbitrary precision, allowing designers to specify any desired bit width for both integer and fixed point data types. SystemC data types can be used interchangeably. At higher design abstraction levels, this allows using only numbers represented by a minimum number of bits to minimise area and power and remain within error tolerances.

2. Interfaces: If a design's interface is hammering the bus or memory, the designer can expand the bit-width of the interface to do several read and writes at once and store the data locally. In pure C++ designs, this can be achieved simply by using HLS interface synthesis technology without modifying the source code. In SystemC, constraints may work in limited cases but can always be implemented by changing the source code.

3. Memory architecture: For many algorithms, power, performance, and area are highly dependent on memory architectures. For example, a FIR filter can be implemented using a shift register, a rotational shift register, or a circular buffer (figure 1).

Figure 1: Filter tap implementation in FIR filtering.

A shift register based implementation can consume higher power at higher frequencies because all taps will switch with each shift. This is typically suited for filters with a smaller number of taps and gives the highest performance.

Rotational shift is an intermediate solution. This removes the MUX feeding the multiplier. This becomes a bottleneck as the number of taps becomes larger. Rotation occurs as part of a MAC loop after the +=. A circular buffer based implementation is good for a filter with a large number of taps and is ideal for mapping into a memory. This uses one pointer to set a write point (advances forward) and another pointer to set a read point (decrements in reverse to round the array).

1 • 2 • 3 • 4 Next Page Last Page

Comment on "How to make ESL power optimisation a..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top