Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Processors/DSPs

An overview of offloading CPUs to FPGAs

Posted: 18 Mar 2013     Print Version  Bookmark and Share

Keywords:High Performance Reconfigurable Computing  C algorithms  C-to-HDL compiler 

A number of factors are disrupting the traditional monopoly of microprocessors for being the chip of choice for C algorithms. These include the cost and accessibility of cross-compilation tools, the power and speed limitations of microprocessors, and the availability of more reliable building blocks.

In this article, three university researchers break down the problem into understandable steps that the average developer can follow to determine if FPGAs are worth the (decreasing) bother and – if the answer is "yes" – how to go about it. This is based on hundreds of hours of class and lab testing.

Problem identification
Microprocessors are not going away. They continue to represent the biggest "bang for the buck" and are at the centre of most systems. FPGAs are a complimentary, semi-custom, co-processing resource that is "picking off" the parallelizable tasks from CPUs. FPGAs do this – at lower clock speeds and power – by deploying multi-core parallelism.

High Performance Reconfigurable Computing (HPRC) as a branch of Computer Science is thriving. Largely driven by general-purpose graphics processing unit (GPGPU) growth, HPRC is also supported by FPGA-based applications. The programming environment is considered to be the main obstacle preventing FPGAs from being used to their full potential in accelerators. Thus, the need to gain familiarity with High Level Languages (HLLs) is inevitable.

Figure 1: Software processes are converted to multiple streaming hardware processes where they use streams, signals, or memory for synchronisation.

Architectural differences in C for FPGAs vs C for CPUs
The C language, refactored for FPGA, can be characterized as a stream-oriented, process-based language. Processes are the main building blocks interconnected using streams to form the architecture for the desired hardware module. From the hardware perspective, processes and streams are hardware modules and FIFOs (First In, First Out registers) respectively. The C programming model is generally based on the Communicating Sequential Processes model. Every process must be classified as a hardware or a software process. It is the programmer's responsibility to ensure inter-process synchronisation. Like most HLLs, C does not provide access to the clock signal, which relieves the designer from implementing cycle synchronisation procedures. However, it is possible to attach HDL modules and synchronise them at the RTL level using clock signals. It is worth noting that C as a hardware design language does not permit dynamic resource allocation (e.g., "malloc()" and "calloc()").

The second unique language construct, besides being process-oriented, is stream orientation. Streams are unidirectional and can interconnect only two processes, which imposes restrictions on hardware module architectures designed in C. Since pipelines can become a source of deadlocks, the designer particularly needs to consider mechanisms to avoid them. Unfortunately, occurrences of deadlocks are difficult to trace during simulations since the "#pragma co pipeline" C-to-HDL compiler directive is ignored during software simulation. These problems are usually revealed after implementation when the module is tested in hardware.

In addition to streams and processes, C as a design method provides signals and semaphores. These structures are used for inter-process synchronisation. The best practice is often to implement pure pipeline modules, with the lowest possible number of synchronisation signals.

Figure 2: Stage Delay Analysis provides the tools needed to see how decisions made in C algorithms will propagate in logic and clock cycles.

HLLs used for this purpose are generally intended to be flexible in terms of data types so as to ease HDL module integration. Typically, there will be a range of data structures available such as co_int2, co_int32, co_uint1, co_uint32, etc. These constructs are also a source of inconsistency between the software and hardware implementations. Prior to FPGA implementation, all of the hardware modules should be simulated on a GPP (general purpose processor) where their data structures are mapped on the types available on the GPP. Unfortunately GPPs use limited sets of data types, so each time a simulation is performed, the data is extended to the nearest wider data type, which affects intrinsic computation precision. This operation is performed unless a dedicated macro is used (e.g. "UADD4()" and "UDIV20()"); thus, using macros is encouraged.

1 • 2 • 3 Next Page Last Page

Comment on "An overview of offloading CPUs to FP..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top