Global Sources
EE Times-India
Stay in touch with EE Times India
 
EE Times-India > Manufacturing/Packaging
 
 
Manufacturing/Packaging  

Managing single event effects in FPGAs, ASICs and processors (Part 2)

Posted: 09 Jan 2012     Print Version  Bookmark and Share

Keywords:single-event effects  ASICs  FPGAs 

To understand the various mitigation approaches for SEEs, we can examine several scenarios. Consider a processor having a failure-in-time (FIT) rate of 600 at sea level in New York, NY, corresponding to a mean time between failure (MTBF) of roughly 190 years. While an MTBF of this magnitude can seem insignificant, if 1,000 systems are fielded, then the combined MTBF of all systems drops to 70 days—one upset every 70 days on average. This rate might not be tolerable for high-reliability systems such as networking routers or those used in industrial applications.

Alternatively, let's examine an application at high altitude. A FIT rate of 600 at sea level in New York corresponds to a rate of 367,200 at an elevation of 12192 m (40,000 feet) over the poles, which represents a MTBF of 110 days for a single fielded unit. Flying a hundred units results in roughly one upset per day. In other words, one system in the air has the nearly same magnitude of upset as 1,000 systems on the ground.

Both the memory and logical structures in ASICs are susceptible to SEEs, especially at sub-90-nm technology nodes. Similarly, FPGA configuration memory and user block memory are upsettable. This susceptibility does not mean that these technologies are unsuitable for avionics and high-reliability systems; it means that SEEs should be considered in the development process and mitigation tactics must be employed. Designers should assess the following before making a final selection between ASIC or FPGA:

 • Frequency of events—FIT rate and MTBF
 • Detection time of events and means of detecting the event
 • Recovery time after event detection
 • Performance, area, and monetary cost of the mitigation solutions
 • System performance and system design implications

When designing with both ASIC and FPGA solutions, the following fault detection and mitigation techniques should be considered:

 • Soft-error mitigation IP (SEM IP)—good for FPGAs and soft processor only
 • ECC or parity checks for user memories in both ASICs and FPGAs
 • Software-implemented fault tolerance (SWIFT) for both soft and hard processor solutions
 • Hardware mitigation solutions—lockstep operation, dual and triple module redundancy (DMR and TMR) for FPGA solutions or ASIC designs
 • Watchdog timers

All mitigation approaches should consider area, performance, detection time, and correction time balanced against fixed and variable costs as well as system safety and reliability costs. Effective FPGA SEE mitigation methods include:

 • External watchdog timer with external handling control (lacks full device check)
 • Full-device cyclic redundancy check (CRC) with external reset of FPGA (might upset operation when unnecessary)
 • Full-device CRC with bit correction and flag to design (design can decide on further actions)
 • Full-device CRC with correction and non-essential bit classification (ignores 66% of false positives). See Architectures and Refinement of FIT Rates for a description of essential bits.
 • DMR and TMR design techniques, or lockstep operation (area hit)
 • Additional built-in fault tolerance checks (custom generated)
 • Safe state machines—"safe_implementation" and "when others" statement with recovery state
 • Software-implemented fault tolerance (SWIFT) techniques (for processors)
 • Memory protections using ECC or parity checks
 • Flow checks, range checks, signatures, CRCs, parity, etc.

ASIC SEE mitigation methods include:

 • External watchdog timers (can catch every time-dependent behaviour)
 • Architectural mitigation (costly solutions on top of increasingly costly technology nodes)
 • SWIFT techniques (for processors)
 • Memory protections using ECC or parity checks

ASIC, processor robustness

With each successive process node, the cost of ASIC non-recurring engineering increases by $5 million or more. At the same time, the ASIC susceptibility to SEEs increases as operating voltage and elemental capacitance decrease. These smaller technology nodes are the critical enablers of power reduction and increased performance with higher clocking speeds. All of these aspects drive greater design density.

1 • 2 • 3 Next Page Last Page



Comment on "Managing single event effects in FPG..."
Comments:  
*  You can enter [0] more charecters.
*Verify code:
 
 
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

 

Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

 
Back to Top