Global Sources
EE Times-India
Stay in touch with EE Times India
EE Times-India > Embedded

Achieve error resilience throughout the embedded system

Posted: 24 Aug 2012     Print Version  Bookmark and Share

Keywords:soft error rates  mean time between failures  ECC 

Diminishing semiconductor device geometries allow ever higher levels of integration in system-on-chip (SoC) devices. In the domain of FPGAs, this results in very high capacity programmable hardware devices. At 28-nm, the latest trend in FPGAs is to combine FPGA fabric with a high-performance SoC. Dubbed an "SoC FPGA", these devices contain a dual-core ARM Cortex A9 processor, level 2 cache, a rich set of peripherals, up to four memory controllers, high-speed transceivers, and a low-power, low-cost 28-nm FPGA fabric. Such a concentration of computational performance drives embedded systems to carrying abundance in memory capacity. Several gigabytes of DDR is no exception, and with that more attention must be paid to the probability and avoidance of soft errors.

What are soft errors?
Commonly used memory bit cells retain their programmed value in the form of an electrical charge. Writing a memory bit cell consists of reprogramming and forcing the electrical charge to represent the new desired value. Memory bit cells will retain their value indefinitely, as long as basic requirements are met, e.g. power is applied, and – for dynamic memory types – a refresh method is active.

The stored charge can be negatively impacted by injection of a charge foreign to the memory device. Cosmic energy may affect a memory bit cell, as the earth atmosphere is a significant, but not flawless barrier. Alpha particles are emitted by decay of materials, and while the chip packaging is engineered for very low emission rates, the problem can't be totally ignored.

The event in which an external energy injection inadvertently modifies the value of a memory bit cell is referred to as a single event upset (SEU). The class of these errors is soft errors, as the error is not caused by a defect in the device, but instead by the device being subject to an outside disturbance. If the correct data is subsequently rewritten, it is not likely to undergo the same upset. As such, the likelihood of such an event is extremely small, while it increases with growing memory capacity.

The acceptability of an SEU rate depends on the application domain. Developers of applications used at high altitudes will be concerned with higher soft error rates (SER) due to cosmic rays. Military, automotive, high-performance computing, communication, and industrial customers will be concerned with degradation of safety, security and reliability.

Figure 1: Expected soft error rates for different memory capacities.

Based on heuristic probabilities of soft errors, identified as low and high boundaries, figure 1 shows expected soft error rates for a number of capacities of memories. As an example, an embedded system with one gigabyte of dynamic memory is expected to have a mean time between failures (MTBF) in the range of a few times per year to once every few years.

Implications of soft errors
Memory data corruption is often fatal to the operation of an embedded system. In a processor-based system, memory errors result in incorrect values in either instruction or data streams. Modern processors will detect illegal instructions, commonly forcing a reboot of the system. Errors in data streams may cause the program flow to derail, which often results in illegal access to protected memory. These events have their equivalent in the desktop world as a "blue screen of death" or a "core dump."

While a crash is undesirable in embedded systems, the alternative is worse. Errors that are not immediately detected can linger in the system for an extended period of time. Undetected memory errors can multiply as the faulty data is used to calculate new data. Once faulty data has been detected, the originating point and the subsequent induced damage may be difficult to correct or even identify. Embedded systems often operate for extended periods of time and are not frequently rebooted as one may see with desktop computers. This gives embedded systems the additional disadvantage that errors will accumulate over time.

1 • 2 • 3 Next Page Last Page

Comment on "Achieve error resilience throughout ..."
*  You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.


Go to top             Connect on Facebook      Follow us on Twitter      Follow us on Orkut

Back to Top