Global Sources
EE Times-India
Stay in touch with EE Times India
 
EE Times-India > Memory/Storage
 
 
Memory/Storage  

Designing NAND flash controller with high-level synthesis

Posted: 09 Apr 2012     Print Version  Bookmark and Share

Keywords:High-level synthesis  electronic system-level  register transfer-level 

The synthesis results in table 2 indicate that the decoder occupies 99% of the area. The reason is that the logarithm and anti-logarithm values of Galois field (GF) required for the decoding operation are pre-computed and recorded in two constant arrays in the original software implementation, and these two arrays will be synthesised as huge lookup tables (LUTs). In a software implementation, using memory to keep necessary information is a common way to reduce search and computation complexity since operation systems and software designers are able to completely manage the usage of memory. This is not a good approach to hardware implementation, however, since the memory size is too large.

Table 2: Synthesis reports of initial design.

Refined design for HLS
The results tell us that directly synthesising a software-centric model is unrealistic since physical issues like area and timing are not considered. If the original design had bad coding styles or inappropriate architectures, it would be difficult for HLS to obtain good QoR. For this reason, designers should keep in mind the implied circuit architectures when structuring their code.

To refine the ECC block, we referred to the domain knowledge of the coding theorem and made some changes. For the encoder, we manually performed unfolding for the parity calculation and shortened the path length for a sequence of XOR operations. For the decoder, the first step was to reduce the array size of the logarithm and anti-logarithm values of GF. Only a small part of the logarithm values were kept, and dedicated arithmetic hardware for multiplication and addition operations on GF were implemented to calculate other required values. In addition, syndrome calculation by the remainders of the division between received polynomials and minimal polynomials reduced the number of GF operations. The Berlekamp iterative algorithm was replaced by the Berlekamp tree algorithm with a set of formulas. For the Chien search algorithm, we skipped unnecessary attempts at the root to improve the latency of decoding operations. The modifications and validation process took about two months.

An area comparison using the compiler between the initial and refined designs is shown in table 3, and the QoR achieved a 94.13% improvement. This result indicates that designing with hardware in mind is an important factor to achieve good QoR. Because we did not have a corresponding RTL design, we compared our implementation with an academic research paper [1] that implemented only the BCH decoder. In terms of decoding latency, the data suggests that our NAND flash controller design has a faster decoding ratio and no timing penalty.

Table 3: Area comparison between initial and refined design.

Table 4: Comparison with handwritten RTL.

According to our experimental results, C-to-Silicon Compiler indeed improves design productivity since it reduces the effort on debug and verification. In contrast with an RTL flow that requires about one year to finish HW and SW development, we experienced about a factor of two improvement in design productivity. Also, the compiler can be applied to either datapath-dominated or control-intensive designs, and the timing and area QoR seems competitive to hand-edited RTL (table 4). In addition to requiring a new modelling style that is very different from RTL, however, HLS has a couple of limitations—the synthesised circuits from SystemC can be hard to predict, and the generated RTL is hard to read.

To get the best results from HLS, designers need to learn how to architect the design in SystemC to generate the intended hardware, and then rely on the HLS tool's graphical user interface to analyse the results. We found that once you learn this, you can produce high-quality hardware much more quickly.

Reference
[1] F. Sun, K. Rose, and T. Zhang, "On the use of strong BCH codes for improving multi-level NAND flash memory storage capacity," IEEE Workshop on Signal Processing Systems (SiPS), Oct. 2006.

About the authors
Tung-Hua Yeh is engineer of the Design Automation Technology Division in ITRI. His current role is to develop and integrate ESL and HLS design flow. Tung-Hua received Ph.D. degree in computer science and engineering from National Chung-Hsing University (NCHU) in 2011. In his Ph.D. careers, he focused on HLS system development for low power and high testability designs. His research interests include high level synthesis, design-for-testability, and system-level design.

Jen-Chieh Yeh is c of the Design Automation Technology Division in ITRI. His current role is to manage the ESL team and drive the development of the ITRI system-level design and verification methodology. His research interests include design-for-testability, electronic system-level design, and 3-D IC system architecture exploration. He received an MS degree in 2004 and PhD degree in 2006, from the National Tsing Hua University (NTHU).

Qiang Zhu, solutions engineer of C-to-Silicon Compiler, joined Cadence Design Systems in 2007. Previously he was a research engineer at Fujitusu Laboratories. He holds a B.E. degree from Osaka University, 1998, and an M.E. degree from the Graduate School of Information Science, Nara Institute of Science and Technology, 2000.

To download the PDF version of this article, click here.


 First Page Previous Page 1 • 2 • 3



Comment on "Designing NAND flash controller with..."
Comments:  
*  You can enter [0] more charecters.
*Verify code:
 
 
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

 

Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

 
Back to Top