EE Times-India > EDA/IP

EDA/IP

# A primer on C-slow retiming, system hyper pipelining

Posted: 25 Apr 2014     Print Version

Keywords:pipelining  C-slow retiming  CPU  RTL  verification

Mr. Gauss (1777-1855) was a German mathematician and physical scientist who contributed significantly to many fields, including number theory, algebra, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy, optics, and—unbeknownst to him—FPGAs.

I think we are all familiar with the concept of a lookup table (LUT), which forms the basis for the programmable fabric in an FPGA. Assuming each LUT has one output with an associated net, let's call this an LUT net pair. Now, let's take a fairly big design—a 32bit MIPS processor—and place it unconstrained with low utilisation in a Virtex 5 FPGA. If we extract enough data out of the static timing analysis (STA) report, we will see that the individual LUT net pair delays follow an X2 distribution.

 Figure 6: LUT net pairs follow an X2 distribution.

I don't want to go too much into the math (as if I could), but if you have multiple behaviours following an X2 distribution (with high k), they can be estimated using a Gaussian (normal) distribution. If you extract enough empirical data, you see the following distribution of consecutive LUT net pairs in your timing report file.

 Figure 7: Consecutive LUT net pairs follow a Gaussian distribution.

Based on the empirical data, we can say that one LUT net pair delay can be estimated with a certain probability toµLN=820 ps on a Virtex 5 without constraining the design, whereµLN is the mean for one LUT net pair. The delay of a path through multiple LUTs can be estimated using (lut *µLN), where lut equals the number of LUTs in the path. You may be prompted to say, "So, what? Every FPGA engineer does this quite naturally." However, I believe this indicator deserves more attention. It is definitely useful for timing estimation on FPGAs using CSR-based designs. So let's discuss some points:
• Apart from anything else, it is a lot of fun—predicting a certain statistical behaviour, extracting empirical data, and finding a good match. A good match saves your day.
• Special hard-coded logic in the FPGA (DSP blocks, fast carry chains) also follow a normal distribution with an FPGA specific mean (e.g.,µSN=1.582 ns for a Virtex 5).
• The normal distribution becomes 1 for a high number of LUTs on the path, and constraining the critical path affects the path delay. In any case, theµLN indicator can be used for fast static timing estimations. In fact, timing optimisations start improving paths with delays greater than (lut *µLN). It is obvious that timing optimisations get more costly as soon as the worst-case delay = (lut *µLN).
• CSR-based designs usually don't have more than four LUTs on a critical path. This is why this estimation works so well. It must be performed on higher-level representations (RTL or higher) where the concrete value ofµLN is not important.
• It is an indicator that lets you compare two individual FPGAs independent of the design.
• It provides you with a design- and synthesis-independent indicator to compare two different technologies.
• It lets you predict the timing of your design on a new technology (and it doesn't look good for FPGAs).
• It lets you predict how far you are from a realistically achievable timing for your design.
• The indicator works for FPGA and ASICs alike. Having said this, I personally haven't run the analysis on an ASIC database for more than 10 years as of this writing.

1 • 2 • 3 • 4

 Related Articles Editor's Choice
Comment on "A primer on C-slow retiming, system ..."
Comments: *  You can enter [0] more charecters.

Top Ranked Articles

Webinars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Search EE Times India
Services

﻿