Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Memory/Storage

Developing a design methodology for embedded memories

Posted: 01 Jan 2001     Print Version  Bookmark and Share

Keywords:embedded memories 

embedded memories which the system design team lacks design expertise, have all resulted in the growth of memory compiler and custom- memory design houses.

/PDF document

Developing a design methodology for embedded memories which the system design team lacks design expertise, have all resulted in the growth of memory compiler and custom- memory design houses. To meet overarching system de- sign schedules, these design houses can provide many of the onboard memories to the sys- tem designers in a timely fash- ion. While many companies do outsource the design of their embedded memories, many wait too long to make the deci- sion.Seekingoutsidehelpearly in the schedule can give the sys- tem designers the pin loca- tions, footprints (which will es- tablish the memory size) and the HDL models for the memo- ries as soon as possible. An alternate method of ob- taining an embedded memory design is to use a memory com- piler, which can provide a physical block in a relatively quick and inexpensive manner. While this method is expedient andquiteadequateforstandard memory configurations, it poses several down sides as well. Generally, compiled memory designs result in a larger memory block and less efficient overall system perfor- mance. Conversely, obtaining an embedded memory design through a custom design house such as ours--Puyallup Inte- grated Circuit Company (PICCO)--can offer multiple advantages. Customized memories can accommodate emerging system needs such as the need to pitch match the logic with the memory core. Instead of placing a standard memory block on the chip and then synthesizing the logic around it to create a desired function, designers can move the logic into the memory block,allowingthephysicallay- out to fit tightly with the memory pitch dimensions. This approach reduces the overall chip size, allows for a higher memory density, and improves the performance of the chip. The resulting design can be faster, more compact, less power-hungry, and more cleanly routed. The complexities of current memory design demand a thor- ough series of procedures. Our design methodology covers the entire spectrum from concept to netlist, including the design, layout, and verification of a memory block. Memories for RISC One of our recently completed designs included all of the em- beddedmemoriesfora500MHz 64bit RISC microprocessor. The onboard memories had to be fast and complex to service the equally fast and complex microprocessor. The various custom memories--which con- sumed more than one third the area of the 200mm2 CPU-- implement Level 1 and Level 2 caches,twolevelsoftranslation look-aside buffers (TLBs) to convert virtual page addresses to physical addresses, multi- port register files for fixed- and floating-point cores, and other functions, such as look-up tables (LUTs) and general pur- pose memory (GP). The caches contain separate memories for data storage, tag, and least-recently-used (LRU) functions. In addition to the multiport storage array, the registerfilesalsocontainROMs and CAMs for address transla- tion and a renaming logic unit. In all, we created 20 unique memory designs. Nearly all macros required a single-cycle access. Often these access times needed to be 1ns or less since they comprised only a fraction of the function re- quired during the 2ns pipeline. The complexity and unique- ness of each memory meant that a memory compiler was not a viable option. Each em- bedded memory required a cus- tom design using novel circuit techniques to meet the high performance, density, low power, and high noise immu- nity required for the micropro- cessor. Such a microprocessor had to use one of the most ad- The advent of smaller geom- etries has made it possible and practical to integrate more functionality onto a semicon- ductor chip. Developers look to incorporate features that will distinguish their products from their competitors', and with these features comes the grow- ingneedforembeddedmemory. Bringing memory onto the ASIC often lowers cost and power consumption, improves performance,andincreases the reliability of the system on a chip (SoC). Many of today's chips de- mand more embedded memory than ever before. Large amounts of SRAM, ROM, EPROM, multi-port RAM and DRAM are finding their way on board. For example, in the case ofhigh-performancemicropro- cessors, 30 to 50 percent of the premium space and 80 percent of the transistors are allocated to the memory alone. These controllers include several lev- elsofcachefordataandinstruc- tions, multi-port SRAMs for TAGs, TLBs, CAMs, register files and general purpose SRAMs. As the need for embed- ded memories continue to in- crease, so does the complexity, density and speed of these memories. This, in turn, cre- ates the need for specialized memory designs that require a high level of expertise and a specialized tool set to which many companies may lack ac- cess. Because of the stand-alone natureofmemoryblocks,many chip developers find that outsourcing the design of the memory module is a rational decisiontomakebothforfinan- cial and human resource rea- sons. Memory blocks can be well defined and separated out from a system much more eas- ily than can other components of a semiconductor chip. The modular nature of memory blocks, the huge demand for embedded memories, as well as the fact that the memory core may utilize new technologies in DS1 28KBD ataram L2 IS 128KBD ataram L2 IS Tag, LRLL Micro-TLB DST ag, LRLL Micro-TLB L2ISM ainT LB L1 IS Dataram Fixed-pointr egister file Floating-pointr egister file Floating-pointr egister file L1 IS Tag GP GP DSM ainT LB GP GP GP GP GP GP I/O I/O I/O I/O Figure 1: The various memories included in the caches are RAMs, TLBs, register files, ROMs, multi-port RAMs, and CAMs, as well as general purpose blocks. vanced, state-of-the-art pro- cesses: 0.185m, 6-layer copper dual-damascene metal CMOS. The small feature sizes and high-performance transistors presented additional design challenges. The narrow wires (whose heights were greater than their horizontal spaces) were especially susceptible to crosstalk and electromigration effects, while the low threshold of the transistors resulted in lower noise tolerances. Design methodology To familiarize ourselves with eachnewprocessandproducea consistent set of guidelines for each designer to follow, we first develop a comprehensive set of designstandards.Theseinclude optimal gate ratios, fanouts, maximum transistor widths, and pre-layout resistance and capacitance rules-of-thumb. Because high-density and high- speedmemoriesrequireaggres- sive circuit techniques, crosstalk avoidance techniques and noise margin design stan- dards are critical. Crosstalk standardsdictateproceduresfor routing adjacent signals, while other noise margin standards definerulesforstaticnoisemar- gin and writability for latched circuits. The design of multiple mac- ros for a chip demands consis- tent circuit standards. Espe- cially important are standards for clock generators and regis- ters so that input setup-and- holdtimesareconsistentacross the entire CPU. To minimize clock skew, the designer needs to tightly control ratios and fanouts, as well as the rise and fall times of all the clock gen- erators. Additionally, we used de- sign-for-test (DFT) features such as scan and full-frequency built-in-self-test (BIST) for each memory. Undoubtedly, BIST is a more complicated technique than a test scheme that multiplexes the I/Os of the embedded memory to a test bus and routes them to the chip I/ O pads for evaluation by a tester. However, BIST offers theadvantagesofworkinginde- pendently of the tester and op- erating the memory at full fre- quency.Dependingonthecom- plexity of the BIST, a signature canisolateafailuretojustapar- ticular instance or to an actual I/O or memory cell. The latter feature is useful for the imple- mentation of redundancy and for detailed failure analysis. BIST also provides a useful technique for testing the func- tionality and determining the maximum operating frequency of the macro or memory, but usually lacks the ability to pre- dict the macro's access time. The DFT features add less than 2percentareaoverheadandare invaluable in validating the memories. Using these tech- niques and custom embedded ATE (automatic test equip- ment) circuits, we have built severaltestchipstovalidatethe complex design techniques used in building the memories. Since it is currently impractical to drive external I/O pads at 500MHz, we implemented pro- prietary embedded ATE cir- cuitry to capture and evaluate the actual access times of the embedded macros. Timing verification Accuratetimingmodelsarecru- cial for any high-performance semiconductorchip.Tocharac- terize and simulate critical paths in the embedded memo- ries, we use HSpice from Avant!. Since it is impractical from a simulation runtime standpoint to simulate the en- tire macro's LPE netlist, we use a lumping and loading tech- nique. While this methodology is common, it often leaves itself open to inaccurate modeling of the distributed loads and trans- mission-lineeffectsthatarerep- resented by resistor-capacitor (RC) networks. The RC networks include not only resistance and capaci- tance but also transistors to ac- curately model gate and source/drain capacitance. Rec- ognizing the need to guarantee accurate timing, we have writ- ten tools to verify that all com- ponents of a critical path match the actual macro LPE netlist. For example, the critical components of a memory-array model could be the placement of memory cell clusters in the four corners of the array; the periphery that contains the ad- dress decoding, clock genera- tion, and drive circuitry; the transmission line nodes that separate the four clusters and long routes; and the coupling capacitors for crosstalk model- ing. HSpice analysis includes simulations for at least six pro- cess, temperature, and voltage corners (P-T-V) with measure statements and plot analysis at each corner. We analyze mea- sure statements and plots and search for incorrect behavior such as poor signal-slew rates, signal glitches caused by crosstalk or charge sharing, unwanted overlapping pulses, poor propagation delays, and poor setup-and-hold margins about clocked circuits. We typically use a Verilog or VHDL model to model and simulate the entire SoC. To en- sure accuracy, each embedded memory has a Verilog model associated with it. Our respon- sibility is to ensure that the cir- cuit implementation function- ally matches the HDL model. For each memory, we write a comprehensive test bench to test all address combinations, control, and test modes (scan and BIST, in other words). We then apply these vectors and their associated expect data to the full LPE netlist for each macro. As mentioned above, it proves impractical to have HSpice simulate extremely large netlists and large vector sets (often thousands of vec- tors). To bridge the gap be- tween HSpice and Verilog, we use Synopsys' Timemill, which combines logical equivalency testing and circuit electrical verification. Physical verification We also use Calibre from Men- tor Graphics to verify the physi- cal design. Complete LVS and DRC rule decks check for cor- rect circuit connectivity and all spacing, width, overlap and en- closure violations. Additional quality-assurance-rule decks check for floating layers, resis- tive connections and unwanted geometries. For layout parasitic extrac- tion, we use Mentor's xCalibre, which generates LPE netlists for use in HSpice-critical path analysis and Timemill simula- tions. For accurate extractions, the layout hierarchy must match the schematic hierarchy at all levels. Additionally, you must embed all feedthroughs into each leaf cell so that you canmodeltheirparasiticeffects in the sub-circuit LPE netlist. AlthoughLPEnetlistsareback- annotatedintothecritical-path simulation, it is imperativethat no major surprises crop up be- tween pre-LPE estimates and post-LPE simulation results. Quality assurance In addition to the mentioned procedures and checks, we also performextensivequalityassur- ance analysis on each macro be- fore its release to the system de- signer.SinceEDAqualityassur- ancetoolsarejustemergingand they may not be fully validated, we have developed many of our own in-house checks. One level of QA checking can be achieved using in-house Embedded memory macro #n Embeddedg ate Tap controller Tap I/O On-boardB IST & scan Memoryc oreb lock On-boardB IST & scan On-boardB IST & scan Memoryc oreb lock On-boardB IST & scan Clock tree Clock Clockg enerator( PLL) Low-speedd igitalt ester Embeddedg ate Embedded memory macro #1 Figure 2: The test chip consisted of several embedded memories with scan and BIST, a TAP controller to initialize each test and a PLL to drive the internal clock grid at frequency. software developed specifically for memories in the smaller ge- ometries. We use the tool to ensure that the HSpice critical- path netlist loading exactly matches the full-layout LPE netlist. It also analyzes every netintheentireLPEnetlistand checks for excessive driver fanoutandskewratio;itdetects multiple drivers on a net and finds the nets that are suscep- tible to charge sharing (espe- cially dynamic nets) and crosstalk effects. For the latter, take into account the coupling capacitance, driver's strength, receiver's noise margin, and number of adjacent nets. The designer must either correct or justify any net in violation of any of the above checks. WeperformQA checking on the layout with a special DRC ruleset. This process finds re- sistive connections (for ex- ample,routesthroughpoly,dif- fusion,orsubstrate)andchecks power grid integrity and exces- sively wide transistors. Resis- tive, or soft, connections that a typical DRC ruleset fails to check may not cause a func- tional failure in silicon, but can easily contribute to frequency- related or stability failures. To meet the timing criteria, designers must sometimes make tradeoffs between noise tolerances and speed. Even so, all circuits must pass minimum noise margin rules or the cir- cuit will likely fail when placed intheentireCPU.Circuitssuch as memory cells, "ratioed" logic (also known as pseudo- NMOS) and dynamic logic gates all undergo static and dy- namic noise margin analyses. The power distribution and integrity of the power grid have a significant impact on the macro's performance. Voltage IR drops on Vdd and ground bounce on Vss affect noise mar- gins, timing, and possibly func- tionality. The problem magni- fies with lower supply-voltage levels and smaller Vts associated with deep submicron feature sizes. Additionally, the high current densities at the 500MHz cycle times associated with narrow lines in 0.185m technology increase the possi- bility for electromigration fail- ures. Using Synopsys' Powermill to simulate the entire macro's power, we can create a current map that details each subcircuit's power by place- ment location. The current map, along with the macro layout's RC-extracted netlist, is input to a tool that analyzes the power buses' IR drops and EM. The tool reports any wire seg- ment or contact/via that fails, allowing designers to improve the busing. Layout overlays of the errors, as well as contour maps and 3D current and volt- age distribution plots are also available to assist the analysis. These QA procedures are not limited to the highest speeds and the smallest pro- cesses. Even larger processes (0.355mandbelow)andtypical slower speeds (100MHz and above) can exhibit increased susceptibility to noise margin, crosstalk, IR drop, or EM-re- lated failures. On outsourcing Whenoutsourcingthedesignof embedded memories, a cus- tomer should expect certain deliverables. Early on, memory designers should provide an ab- stract for floorplanning and placement, and routing that es- tablishesthecriticalboundaries and pin locations for the system designers.Thecustomershould also expect accurate HDL mod- els so that they can eliminate any system bugs. Later, the memorydesignteamshouldde- liver a timing library with delay and race lookup tables or equa- tions that the customer can use Top-leftM emoryc ell Top-leftM emoryc ell Bottom-l eftM emoryc ell Bottom-l eftM emoryc ell Row Row Periphery Inputs Left I/O Right I/O RC RC RCCol. Col. RC RC RC Figure 3: During timing simulation, you can use lumping and loading techniques to check the critical components in your memory-array model. in full-chip logic and timing simulations. Thefinalproductisthecom- plete layout database, which will be the memory block that isdroppedintoplaceonthesys- tem chip. It should come with complete documentation that includes simulation, timing, and verification results, as well as design details, netlists, and schematics. Embedded memories are a vitalpartoftoday'ssemiconduc- tor chips, and the level of interoperability they provide to the full chip determines the ef- ficiency, speed, and perfor- mance of the overall chip. A solid design methodology can deliverawell-designedmemory. Embedded memories re- quire tighter controls than tra- ditional off-chip memories, since they are subject to exter- nally generated noise. More- over, the power grid of the memory may need to carry cur- rent from the external logic. Designersmustlearntopredict and implement accurate gray box models because they usu- allydesignthememoriesinpar- allel with the entire chip, and the memory integration must occur without a hitch. The development of quality embedded memories starts with the setting of stringent de- sign standards. This effort, supplemented by quality assur- ance tools, truly succeeds only when implemented by a design team that not only can design innovative circuits,butalsohas the discipline to adhere to the strict methodology. [Integrated System Design] By Eric Hall Engineering Manager E-mail: [email protected] George Costakis Engineering Manager E-mail: [email protected] Broadcom Corp.

Comment on "Developing a design methodology for ..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top