throbber

`
`
`
`
`
`21st International Conference on VLSI Design21st International Conference on VLSI Design21st International Conference on VLSI Design21st International Conference on VLSI Design21st International Conference on VLSI Design21st International Conference on VLSI Design21st International Conference on VLSI Design
`
`
`
`
`
`
`
`Single Event Upset: An Embedded Tutorial
`
`Fan Wang and Vishwani D. Agrawal
`Dept. of Electrical and Computer Engineering, Auburn University, Auburn, AL, 36849, USA
`Email: wangfan@auburn.edu, vagrawal@eng.auburn.edu
`
`Abstract— With the continuous downscaling of CMOS tech-
`nologies, the reliability has become a major bottleneck in the
`evolution of the next generation systems. Technology trends such
`as transistor down-sizing, use of new materials, and system
`on chip architectures continue to increase the sensitivity of
`systems to soft errors. These errors are random and not related
`to permanent hardware faults. Their causes may be internal
`(e.g., interconnect coupling) or external (e.g., cosmic radiation).
`To meet the system reliability requirements it is necessary for
`both the circuit designers and test engineers to get the basic
`knowledge of the soft errors. We present a tutorial study of
`the radiation-induced single event upset phenomenon caused by
`external radiation, which is a major source of soft errors. We
`summarize basic radiation mechanisms and the resulting soft
`errors in silicon. Soft error mitigation techniques with time and
`space redundancy are illustrated. An industrial design example,
`the IBM z990 system, shows how the industry is dealing with
`soft errors these days.
`
`I. INTRODUCTION
`From the beginning of the recorded history, man has be-
`lieved in the influence of heavenly bodies on the life on
`Earth. Machines, electronics included, are considered scientific
`objects whose fate is controlled by man. So, in spite of the
`knowledge of the exact date and time of its manufacture, we
`do not draft a horoscope for a machine. Lately, however, we
`have started noticing certain behaviors in the state of the art
`electronic circuits whose causes are traced to be external and to
`the celestial bodies outside our Earth. The Single Even Upset
`(SEU) phenomenon, as this non-permanent (i.e., random or
`soft) error behavior is termed, in digital systems affects the
`modern nanotechnology electronic devices. We believe SEU
`will assume greater importance in the future [12]. Sifting
`through the literature of the last half a century, we have
`collected the necessary material for a starter. Our aim is not to
`cram up these six pages with most information, but to provide
`the essentials that can be assimilated conveniently to help a
`reader to become an effective contributor. We begin with the
`definition.
`“Single Event Upset (SEU): Radiation-induced er-
`rors in microelectronic circuits caused when charged
`particles (usually from the radiation belts or from
`cosmic rays) lose energy by ionizing the medium
`through which they pass, leaving behind a wake of
`electron-hole pairs”. · · · NASA Thesaurus
`The objective of this tutorial is to familiarize the reader with
`the SEU in digital electronics – definitions and terms, causes
`(mostly experimental), measurement and estimation, reliability
`standards, and the related design methods. You should expect
`
`to get almost complete, but not comprehensive, information.
`Looking over the Appendix on the last page will improve the
`comprehension as you read through this article.
`We will present an up-to-date understanding of the SEU
`phenomena. Following the historical note of the following sec-
`tion, we summarize the concept of basic radiation mechanisms
`and explain how a soft error occurs in silicon in Section III.
`Examples of soft error mitigation techniques are presented in
`Section IV. In Section V, a case study of soft error detection
`and tolerance in IBM z990 system is given.
`
`II. HISTORICAL NOTES
`
`Soft errors have been studied by electrical, aerospace,
`nuclear and radiation engineers for almost half a century. In
`the period 1954 through 1957 failures in digital electronics
`were reported during the above-ground nuclear bomb tests.
`These were treated as electronic anomalies in the monitoring
`equipment because they were random and their cause could
`not be traced to any hardware fault [27]. Perhaps the first
`paper concerning the role of cosmic rays on electronics is
`by Wallmark and Marcus [24]. As quoted in the recent
`literature [16], these authors predicted that cosmic rays would
`start upsetting microcircuits due to heavy ionized particle
`strikes and cosmic ray reactions when feature sizes become
`small enough. Through 1970s and early 1980s, the effects of
`radiation received attention and more researchers examined
`the physics of these phenomena. Also from 1950s, theories
`of fault tolerance and self-repairing computing were being
`developed due to the increased reliability requirement of
`critical applications like the space-mission [23].
`May and Woods of Intel Corporation [13] determined that
`these errors were caused by the alpha particles emitted in
`the radioactive decay of uranium and thorium present just
`in few parts-per-million levels in package materials. Their
`paper represented the first public account of radiation-induced
`upsets in electronic devices at sea level and these errors
`were referred to as “soft errors”. The term soft error was
`used to differentiate from the repeatable errors traceable to
`permanent hardware faults. Guenzer and Wolicki [10] reported
`that the error causing particles came not only from uranium
`and thorium but that nuclear reactions generated high energy
`neutrons and protons, which could also cause upsets in circuits.
`Because the title of their paper was “Single Event Upset of
`Dynamic RAMs by Neutrons and Protons”, the term “SEU”
`has been in use ever since [10] (refer to [16]). In 1979, Ziegler
`and Lanford from IBM [28] predicted that cosmic rays could
`
`
`
`
`
`
`
`1063-9667/08 $25.00 © 2008 IEEE1063-9667/08 $25.00 © 2008 IEEE1063-9667/08 $25.00 © 2008 IEEE1063-9667/08 $25.00 © 2008 IEEE1063-9667/08 $25.00 © 2008 IEEE1063-9667/08 $25.00 © 2008 IEEE1063-9667/08 $25.00 © 2008 IEEE
`
`
`
`
`
`
`DOI 10.1109/VLSI.2008.28DOI 10.1109/VLSI.2008.28DOI 10.1109/VLSI.2008.28DOI 10.1109/VLSI.2008.28DOI 10.1109/VLSI.2008.28DOI 10.1109/VLSI.2008.28DOI 10.1109/VLSI.2008.28
`
`
`
`
`
`
`
`429429429429429429429
`
`
`
`
`
`
`
`Dell Ex. 1009
`Page 1
`
`

`

`TABLE I
`PROJECTED FAILURE RATE ON SRAM-BASED FPGAS APPLICATIONS DUE TO NEUTRON EFFECTS (ACTEL)
`Application Examples
`Altitude
`Neutron Flux
`FPGAs/
`#upsets/1M-gate MTBF1 (hours)
`FIT1 (in million)
`(feet)
`(relative)
`System
`FPGA/day(.13µ)
`0.13µ
`0.09µ
`0.13µ
`0.09µ
`5000
`1
`512
`4.19E-4
`112
`58
`8.92
`17.24
`∼40
`30,000
`4
`1.85E-2
`324
`162
`3.09
`6.17
`60,000
`>160
`16
`8.33E-2
`18
`9
`55.56
`111.11
`
`(1) Ground-based Communication Network
`(2) Civilian Avionics System
`(3) Military Avionics System
`
`result in the same upset phenomenon in electronics (not only
`memories) even at sea level.
`Recent Soft Error Rate (SER) testing result of SRAM-based
`FPGAs from Actel [1] shows a significant and growing risk of
`functional failures due to the corruption of configuration data,
`especially when the system has higher densities. Table I shows
`the failure rate projection for different applications without
`using any protection. The number of upsets per 1 million gates
`per day increases for cases (1) through (3) because of the
`altitude dependent increase in neutron flux density. The table
`includes projected failure rates for the 90nm process. It is
`expected that neutron-induced soft errors get worse by a factor
`of two as we move from 0.13µ to 0.09µ technology. Note
`that this table ignores alpha particle effects, which are also
`expected to be significant for nanometer technologies and will
`further increase the system failure rate.
`The radiation induced soft errors have become one of the
`most important and challenging failure mechanisms in modern
`electronic devices. SER of commercial chips is controlled
`to within 100~1000 FITs1. Compared to most hard failure
`mechanisms that produce failure rates on the order of 1~100
`FIT, the SER of a low-voltage embedded SRAM can easily
`be 1000 FIT/Mbit. Therefore, a four-phase approach to deal
`with them is in progress [21]:
`1) Methods to protect chips from soft errors (prevention).
`2) Methods to detect soft errors (testing).
`3) Methods estimate the impact of soft errors (assessment).
`4) Methods to recover from soft errors (recovery).
`III. WHAT IS SOFT ERROR?
`A. Soft Error Categories
`An electronic circuit, that bears no permanent hardware
`fault, may witness unexplained events resulting in single bit
`changes spontaneously in the system, and there is no way
`to repeat such failures. Within the computer industry such
`phenomenon is known as a “soft fail”, to differentiate from
`the “hard or permanent fail”, which may be repairable [28].
`After observing a soft error, there is no implication that the
`system hardware is any less reliable than before because
`the soft fail is completely random. These soft fails may be
`caused by the well-known electronic noise sources such as
`a noisy power supply, lighting, and electrostatic discharge
`(ESD), or the thermal radiation from the galaxy, such as from
`radiation-emitting stars and atmospheric gases. A soft or non-
`permanent fault is a non-destructive fault and falls into two
`categories [22]:
`1) Transient faults, caused by environmental conditions
`like temperature, humidity, pressure, voltage, power
`
`1See Appendix
`
`supply, vibrations, fluctuations, electromagnetic interfer-
`ence, ground loops, cosmic rays and alpha particles.
`2) Intermittent faults caused by non-environmental condi-
`tions like loose connections, aging components, critical
`timing, power supply noise, resistive or capacitive vari-
`ations or couplings, and noise in the system.
`With advances in the design and manufacturing technology,
`non-environmental conditions may not affect the sub-micron
`semiconductor reliability. However, the errors caused by cos-
`mic rays and alpha particles remain the dominant factors
`causing errors in electronic systems.
`
`B. Radiation Mechanisms in Semiconductors
`Three principal radiation sources cause soft errors in ad-
`vanced semiconductor devices [5]:
`
`1) Alpha particles are emitted when the nucleus of an
`unstable isotope decays to a lower energy state. They
`contain kinetic energy in the range of 4 to 9 MeV.
`There are many radioactive isotopes, however, uranium
`and thorium have the highest activity among naturally
`occurring materials. In the terrestrial environment, major
`sources of alpha particles are radioactive impurities such
`as lead-based isotopes in solder bumps of the flip-chip
`technology, gold used for bonding wires and lid plating,
`aluminum in ceramic packages, lead-frame alloys and
`interconnect metalization [8].
`2) High-energy ( > 1 MeV) neutrons from cosmic radia-
`tion can induce soft errors in semiconductor devices via
`secondary ions produced by the neutron reaction with
`silicon nuclei. Cosmic rays that are of galactic origin
`react with the Earth’s atmosphere to produce complex
`cascades of secondary particles. Less than 1% of the
`primary flux reaches ground level and the predominant
`particles include muons, neutrons, protons, and pions.
`Because pions and muons are short-lived and proton
`and electrons are attenuated by Coulombic interaction
`with the atmosphere, neutrons are the most likely cos-
`mic radiation sources to cause SEU in deep-submicron
`semiconductors at terrestrial altitude. The neutron flux is
`dependent on the altitude above the sea level, the density
`of the neutron flux increases with altitude.
`3) The third significant source of ionizing particles in
`electronic devices is the secondary radiation induced
`from the interaction of cosmic ray neutrons and boron.
`It is the radiation induced by low-energy cosmic neutron
`interactions with the isotope boron-10 (10B is com-
`monly used as p-type dopant for junction formation
`
`430430430430430430430
`
`Dell Ex. 1009
`Page 2
`
`

`

`Fission of 10B induced by the capture of a neutron (commonly
`Fig. 1.
`happened in SRAMs) [3].
`
`in IC package). Specifically, BPSG (Borophosphosili-
`cate glass) dielectric layer is commonly used to form
`insulator layers in IC manufacturing. Boron has two
`isotopes: 10B and 11B of which 10B is unstable. The
`reaction scheme is shown in Figure 1. In the 10B(n, α)
`Li reaction the lithium nucleus is emitted with a kinetic
`energy of 0.84 MeV 94% of the time and 1.014 MeV 6%
`of the time. The gamma photon has energy of 478 KeV,
`while the alpha particle is emitted with an energy of 1.47
`MeV [3]. This mechanism has recently been found to
`be the dominant source of soft errors in 0.25 and 0.18µ
`SRAM fabricated with BPSG. Modern microprocessors
`use highly purified package materials and this radiation
`mechanism is greatly reduced, making the high-energy
`cosmic rays the major reason for soft errors.
`
`C. Sensitive Regions in Silicon
`A single event transient (SET) is caused by the generation
`of charge due to a single particle (proton or heavy ion) passing
`through a sensitive node in the circuit. SETs in linear devices
`differ significantly from other types of single event effects
`(SEE) like SEU in a memory. Each SET has its unique
`characteristics like polarity, waveform, amplitude, duration,
`etc. These characteristics depend on particle impact location,
`particle energy, device technology, device supply voltage and
`output load. In CMOS circuits, the “off” transistors struck by
`a heavy ion in the junction area are most sensitive to SEU by
`particles with high enough LET (linear energy transfer; see
`Appendix) of around 20 MeV-cm2/mg. When these particles
`hit the silicon bulk, the minority carriers are created and if
`collected by the source/drain diffusion regions, the change of
`the voltage value of those nodes occurs [20]. A particle can
`induce SEU when it strikes at the channel region of an off
`NMOS transistor or the drain region of an off PMOS transistor.
`The ionization can induce a current pulse in a p-n junction.
`Conceptually, when the charge injected by the current pulse
`at a sensitive node exceeds the critical charge (Qcrit), a SET
`is generated at the affected junction.
`
`Schematic representation of charge collection in a silicon junction
`Fig. 2.
`immediately after (a) an ion strike, (b) prompt (drift) collection , (c) diffusion
`collection, (d) the junction current induced as a function of time [4].
`
`nodes [4]. Along the traversed path, the particle produces a
`dense radial distribution of electron-hole pairs as illustrated in
`Figure 2(a). If the resultant ionization track traverses the deple-
`tion region, carriers are rapidly collected by the electric field,
`thus compensating the charge stored in the junction. Outside
`the depletion region the non-equilibrium charge distribution in-
`duces a temporary funnel-shaped potential distortion along the
`trajectory of the event, thus further enhancing charge collection
`by drift (Figure 2(b)). A “prompt” collection phase typically
`follows for tens of picoseconds and as the funnel collapses,
`diffusion then dominates the collection process (Figure 2(c))
`until all excess carriers have been collected, recombined, or
`diffused away from the junction area (about nanoseconds). The
`transient charge collected from the radiation event produces
`a current pulse at the junction as illustrated in Figure 2(d)
`[4]. The current
`transient
`typically lasts 200 picoseconds
`with the bulk of the charge collection occurring within 2~3
`microns of the junction region for modern submicron CMOS
`technologies. The time constants depend strongly on the type
`of ion, its initial energy and the properties of the specific
`technology [4]. If enough charge is collected by a node the data
`state may change. The collected charge (Qcoll) is a function of
`the ionizing particle’s energy and trajectory, silicon substrate
`structure and doping, and the local electric field [4].
`
`D. Single Event Transient
`In Figure 2, an SET is produced after an energetic ionizing
`particle has been brought to the silicon near sensitive device
`
`A commonly used approximate analytical model for the
`induced transient current waveform for ion track charge col-
`lection has a double-exponential form [15] with a rapid rise
`
`431431431431431431431
`
`n(cid:13) +(cid:13)
`
`I(cid:13)o(cid:13)n(cid:13) (cid:13)T(cid:13)r(cid:13)a(cid:13)c(cid:13)k(cid:13)
`
`I(cid:13)d(cid:13)r(cid:13)i(cid:13)f(cid:13)t(cid:13)
`
`I(cid:13)d(cid:13)i(cid:13)f(cid:13)f(cid:13)
`
`Dell Ex. 1009
`Page 3
`
`

`

`time and a gradual fall time:
`
`(cid:40)
`
`(1)
`
`− t
`τα − e
`− t
`τβ )
`(e
`I(t) = Qcoll
`(a)
`τα−τβ
`Qcoll = 10.8 × L × LET
`(b)
`where Qcoll is the collected charge (in femto coulomb) in
`the sensitive region, τα is a process-dependent collection time
`constant of the junction, and τβ is the ion-track establishment
`time constant, which is relatively independent of the technol-
`ogy. Typical values are approximately 1.64× 10−10sec for τα
`and 5 × 10−11sec for τβ [7]. In bulk silicon, a typical charge
`collection depth (L in micron) is 2 for every linear energy
`tranfer (LET ) of 1 MeV-cm2/mg, and an ionizing particle
`deposits about 10.8f C charge along each micron of its track.
`The induced transient voltage pulse may propagate through
`several levels of logic gates. Because a particle can induce an
`SEU when it strikes either the channel region of an off NMOS
`transistor or the drain region of an off PMOS transistor, we will
`consider the strike at an off PMOS drain area as an illustrative
`example. The critical charge depends on the total charge
`collected at the sensitive node as well as on the temporal
`shape of the current pulse and the device supply voltage. So,
`a parameter called “switching time (tth)” or “feedback time”
`is defined as the interval after the particle strikes at which
`the affected node voltage exceeds the threshold voltage. The
`charge on the output capacitor equals Qcrit at that time. Qcrit
`can be calculated by integrating the current that flows at the
`sensitive node after the strike [9]. The condition for the SEE
`to propagate is that output node voltage follows Equation 2.
`V ≥ Qcrit
`C
`0
`The pulse width of the voltage pulse depends on the
`value of the capacitance and the RC time constant of the
`discharging path. For example, in ami12 technology, when
`the output
`load capacitance is 100fF and the cumulative
`collected charge is 0.65pC, the amplitude of the voltage pulse
`is 0.65pC/100f F = 0.65× 10−12C/100× 10−15F = 0.65V .
`We observe that for the same charge collected in the sensitive
`area a smaller load capacitance will have a larger amplitude of
`the SEE-induced voltage pulse. The discharge process can be
`modeled by a simple RC-circuit. So, the voltage as a function
`of time is v(t) = v(0) −t
`RC . Thus, smaller the RC value, faster
`is the discharge process. A schematic view of how the SEE-
`induced current pulse translates into an SEE-induced voltage
`pulse is given in Figure 3.
`
`Idrain(t)dt
`
`(2)
`
`(cid:90) tth
`
`1 C
`
`=
`
`Fig. 3. An schematic view of how SEE-induced current pulse translates into
`a voltage pulse in a CMOS inverter.
`
`recovery procedure. Random errors due to noise, unreliable
`components, and coupling effects may also require recovery
`mechanisms [21]. The need for a recovery mechanism stems
`from the fact that prevention techniques may not be enough for
`contemporary microchips, because the supply voltage keeps
`reducing, feature size keeps shrinking, and the clock frequency
`keeps increasing. Also, the cost of prevention techniques for
`a fault tolerant design may be too high. Because the error-
`tolerant computing is a broad area, here we only give a
`few examples of techniques used for soft error mitigation. In
`addition, a built-in soft error resilience (BISER) technique for
`correcting radiation-induced soft errors in latches and flip-flops
`may be found [25]. In that work, the error-correcting latch and
`flip-flop designs are power efficient and can correct both flip-
`flop errors and combinational logic errors, and employ reuse
`of on-chip scan design-for-testability in cell-level SER.
`
`A. Prevention Techniques
`1) Purify the Fabrication Material: A significant improve-
`ment
`in the SER performance of microelectronics can be
`achieved by eliminating or reducing the sources of radiation.
`To reduce the alpha particle emission in the final packaged IC,
`high purity materials and processes are employed. Uranium
`and thorium impurities have been reduced below one hundred
`parts per trillion for high reliability. Going from the conven-
`tional IC packaging to an ultra-low alpha packaging materials
`the alpha emission is reduced from 5~10 alphas/cm2-hr to less
`than 0.001 alphas/cm2-hr. To reduce the SER induced by the
`10B activation by low energy neutrons, BPSG is replaced by
`other insulators that do not contain boron. In addition, any
`processes using boron precursors is carefully checked for 10B
`content before introducing them to manufacturing process [4].
`When these measures are employed the SER of the IC is
`reduced dramatically, but the SER caused by the cosmic high
`energy neutron interactions cannot be easily shielded.
`2) Radiation Hardened Process Technologies: SER per-
`formance can be greatly improved by adapting the process
`technology either to reduce the collected charge (Qcoll) or
`increase the critical charge (Qcrit) [26]. One approach is to use
`additional well isolation (triple-well or guard-ring structure)
`to reduce the amount of charge collected by creating poten-
`tial barriers, which can limit the efficiency of the funneling
`effect and reduce the likelihood of parasitic bipolar collection
`paths [6].
`
`IV. SOFT ERROR MITIGATION TECHNIQUES
`Soft error tolerant techniques can be classified into two
`types: prevention and recovery. The methods to protect mi-
`crochips from soft-errors are the prevention methods. They are
`used during the chip design and development. The recovery
`methods include on-line recovery mechanisms from soft-
`errors in order to achieve the chip robustness requirement.
`These include fault tolerant computing, ECC/parity, online-
`testing and redundancy. One should note that soft error is not
`the only reason why computer systems need to resort to a
`
`432432432432432432432
`
`Particle Strike(cid:13)
`
`VDD(cid:13)
`
`S(cid:13)E(cid:13)E(cid:13) (cid:13)i(cid:13)n(cid:13)d(cid:13)u(cid:13)c(cid:13)e(cid:13)d(cid:13) (cid:13)D(cid:13)r(cid:13)a(cid:13)i(cid:13)n(cid:13)
`SEE induced(cid:13)
`Current Pulse (cid:13)
`
`VDD(cid:13)
`
`IN(cid:13)
`
`1(cid:13)
`
`OFF(cid:13)
`
`ON(cid:13)
`
`SEE occur(cid:13)
`Charging C(cid:13)_(cid:13)load(cid:13)
`
`0(cid:13)
`
`IN(cid:13)
`
`OUT(cid:13)
`
`1(cid:13)
`
`C(cid:13)_(cid:13) load(cid:13)
`
`OFF(cid:13)
`
`ON(cid:13)
`
`GND(cid:13)
`
`GND(cid:13)
`
`SEE induced(cid:13)
`Voltage Pulse(cid:13)
`
`OUT(cid:13)
`
`0(cid:13)
`
`C(cid:13)_(cid:13) load(cid:13)
`
`Discharging (cid:13)
`
`Dell Ex. 1009
`Page 4
`
`

`

`Another approach replaces bulk silicon well-isolation with
`silicon-on-insulator (SOI) substrate material. The direct charge
`collection is significantly reduced in SOI devices because the
`active device volume is greatly reduced (due to thin silicon
`device layer on the oxide layer) [18]. Recent work shows a
`10X reduction in SER achieved over conventional bulk devices
`when a fully depleted SOI substrate is used. Unfortunately,
`SOI substrates are more expensive than conventional bulk
`substrates and phenomena like parasitic bipolar action limit
`further reduction of SER [4]. Circuit-level solutions such as the
`addition of cross-coupled resistors and capacitors to decrease
`bit-line float time are also employed.
`B. Recovery Techniques
`Fault-tolerant computing methods have existed in the liter-
`ature for quite some time [23] but have seen renewed interest
`due to the SEU phenomenon. On-line testing techniques are
`frequently used as recovery solutions for soft error mitiga-
`tion. Specific techniques includes self-checking design [19],
`concurrent error detection for finite state machines (FSM) by
`signature monitoring, error detection and correction (EDAC)
`codes, and redundancy.
`1) Redundancy: The basic idea of redundancy in design is
`to gain higher system reliability by sacrificing the minimality
`of time or space, or both. The classic triple modular redun-
`dancy (TMR) with a majority voter [2] continues to be widely
`used.
`Mitra et al. [17] combine a self-checking design with time
`redundancy based on the C-element gate to compare two
`samples of the outputs signal from a combinational circuit
`at times t0 and t0 + d. The C-element has the ability to elim-
`inate glitches at combinational outputs. Their error correction
`structure is illustrated in Figure 4. The space redundancy and
`time redundancy are often combined together to meet high
`fault-tolerance requirement with reduced hardware overhead,
`such as duplication and comparison instead of TMR.
`2) ECC and Parity: Memories have a significant role in
`modern systems. Because of very high density of storage
`cells, a large memory is more sensitive to ionized particles
`than the logic. A simple solution for protecting a memory is
`to add parity bits to each memory word. During each write
`operation, a parity generator computes the parity bits of the
`data to be written with the data in the memory. If a particle
`strike alters the state of a single bit of a memory word, the
`error can be discovered by checking the parity code during the
`read operation. Depending on the number of parity bits used,
`this scheme may only detect an error, or correct it as well.
`Such schemes are often combined with system-level approachs
`for error recovery [19]. In most situations, however, the error
`recovery in a memory is more complex so protection of the
`memory by means of codes, like error correcting code (ECC),
`is preferable. Table II summarizes sample EDAC methods for
`memory, data and systems [11].
`V. A CASE STUDY
`The IBM eServer z990 system is designed to detect and
`recover from both soft and permanent errors [14]. The z990
`
`TABLE II
`SAMPLE EDAC METHODS FOR MEMORY OR DATA DEVICES [11]
`EDAC Method
`EDAC Capability
`Parity
`Single Bit Error Detect
`Hamming Code
`Single Bit Error Correct, double bit detect
`RS Code
`Correct consecutive and multiple bytes in
`error
`Corrects isolated burst noise in a communi-
`cation stream
`Specific to each system implementation
`
`Conventional Encoding
`
`Overlying Protocol
`
`contains up to four pluggable nodes connected through a
`planar board in a daisy chain interconnect structure. Each
`node contains up to 64 GB physical memory and 32 MB L2
`cache for a system capacity of 256 GB memory and 126 MB
`L2 cache. In IBM z990 system, microarchitecture-level SEU
`mitigation features include: extensive use of ECC and parity
`with retry on data and controls; full SRAM ECC and par-
`ity protection; operational retries; microprocessor mirroring,
`checkpointing and rollback, and some hardware derating tech-
`niques. These approaches may be useful for future mainframe,
`general purpose, and application-specific computing systems.
`
`VI. CONCLUSION
`Soft error rate in logic and and memory chips will continue
`to increase as devices become more sensitive to soft errors
`even at sea level. The logic FIT rate is expected to increase
`faster due to internal phenomena such as cross coupling,
`ground bounce and delay faults, becoming comparable to
`the prevailing FIT rate of memory. The IBM z990 system
`provides an illustration of how the soft error issue might be
`handled in the industry. Open soft error issues are in the areas
`of EDA tools, radiation tests and measurement, analysis of
`newer radiation mechanisms, device hardening, soft error rate
`analysis, and error mitigation methods, on which research is
`being conducted. We hope we have given a running start to
`our reader.
`
`APPENDIX
`Definitions and Terminology2
`Collected Charge (Qcoll): The charge collected by a particular device
`node during the passage of a particle. The collected charge is dependent on the
`geometry and doping of the node, the particle property like mass, energy and
`trajectory, and the density and type of material in the volume being penetrated
`by the incident radiation.
`Cross Section (σ): The device SEE response to ionizing radia-
`tion.Normally, the units for cross section are cm2/device or cm2/bit.
`Critical Charge (Qcrit): The minimum amount of charge that when
`collected at any sensitive node will cause the node to change state. The critical
`charge is usually generated by incident radiation and, it is dependent on the
`linear energy transfer effective which is usually a function of the angle of
`incident particle radiation.
`LET: Linear Energy Transfer. LET is a measure of the energy transferred
`to the device per unit length as an ionizing particle travels through a material.
`The common unit is MeV-cm2/mg of material (Si for MOS devices).
`LETth: LET threshold (LETth) is the minimum LET to cause an effect
`at a given particle fluence.
`SEE: Single Event Effect. Any measurable or observable change in state
`or performance of a microelectronic device, component, subsystem or system
`resulting from a single energetic particle strike. SEE includes SEU (Single
`
`2These miscellaneous definitions and terms are collected from JEDEC
`standard and relevant papers.
`
`433433433433433433433
`
`Dell Ex. 1009
`Page 5
`
`

`

`Fig. 4. Error correction using duplication, (a) space redundancy structure, (b) time redundancy structure, and (c) C-element [17].
`
`Event Upset), SEL (Single Event Latchup), SEB (Single Event Burnout),
`SEFI (Single Event Functional Interrupt), and SET (Single Event Transient).
`Sensitive Volume: A region, or multiple regions affected by SEE-induced
`radiation. The sensitive volume is determined by the angle of the incident
`radiation, the mass and energy of the incident particles and the density, type
`of the material in the volume being penetrated by the incident radiation. Is
`is not easy to know the geometry of the sensitive volume of the device but
`some information can be gained from the test cross section data.
`Units and Conversion Factors
`Energy Unit: Electron Volt (eV) One eV is the energy gained by one
`electron in accelerating through a potential difference of 1 volt. Energy
`in radiation is usually in unit of MeV (106eV) or KeV (103eV). 1eV =
`1.6×10−19 J, 1MeV = 1.6×10−13 J.
`FIT: Failure in Time; the number of failures per 109 device hours. 1 year
`MTTF (Mean Time To Failure) = 109/(24×365) FIT = 114,155 FIT.
`REFERENCES
`[1] Actel, “Effects of Neutrons on Programmable Logic.–a white
`paper,” Technical report, Actel corporation, Dec., 2002.
`[2] A. Avizienis, “Faulty-Tolerant Computing: An Overview,” Com-
`puters, IEEE Trans. Computers, vol. 4, no. 1, pp. 5–8, 1971.
`[3] R. Baumann, “Soft Errors in Advanced Semiconductor Devices-
`Part I: The Three Radiation Sources,” IEEE Trans. Device and
`Materials Reliability, vol. 1, no. 1, pp. 17–22, 2001.
`[4] R. Baumann, “Soft Errors In Commercial Integration Integrated
`Circuits,” International Jour. High Speed Electronics and Sys-
`tems, vol. 14, no. 2, pp. 299–309, 2004.
`[5] R. Baumann, “Soft Errors in Advanced Computer Systems,”
`IEEE Design & Test of Computers, vol. 22, no. 3, pp. 258–266,
`2005.
`[6] D. Burnett, C. Lage, and A. Bormann, “Soft-Error-Rate Im-
`provement in Advanced BiCMOS SRAMs,” in Proc. 31st An-
`nual IEEE Reliability Physics Symp., Mar. 1993, pp. 156–160.
`[7] V. Carreno, G. Choi, and R. K. Iyer, “Analog-digital simulation
`of transient-induced logic errors and upset susceptibility of an
`advanced control system,” in NASA Technical Memo 4241, 1990.
`[8] C. L. Claeys and E. Simoen, Radiation Effects in Advanced
`Semiconductor Materials and Devices. Springer, 2002.
`[9] C. Detcheverry, C. Dachs, E. Lorfevre, C. Sudre, G. Bruguier,
`J. M. Palau, J. Gasiot, and R. Ecoffet, “SEU Critical Charge
`and Sensitive Area in A Submicron CMOS Technology,” IEEE
`Trans. Nuclear Science, vol. 44, no. 6, pp. 2266–2273, 1997.
`[10] C. S. Guenzer, E. A. Wolicki, and R. G. Allas, “Single Event
`Upset of Dymanic RAMs by Neutrons and Protons,” IEEE
`Trans. Nuclear Science, vol. 26, pp. 5048–5052, Dec. 1979.
`[11] K. L. LaBel, P. W. Marshall, J. L. Barth, E. Stassinopoulos,
`C. Seidleck, and C. Dale, “Commercial Microelectronics Tech-
`nologies for Applications in the Satellite Radiation Environ-
`ment,” in Proc. 1996 IEEE Aerospace Applications, (New York),
`1996, pp. 375–390.
`[12] J. Maiz and N. Seifert, “Introduction to the Special Issue on
`Soft Errors and Data Integrity in Terrestrial Computer Systems,”
`IEEE Trans. Device and Materials Reliability, vol. 5, no. 3, pp.
`303–304, Sept. 2005.
`
`[13] T. C. May and M. H. Woods, “A New Physical Mechanism
`for Soft Errors in Dynamic Memories,” in Proc. 16th Annual
`Reliability Physics Symp., 1978, pp. 33–40.
`[14] P. J. Meaney, S. B. Swaney, P. N. Sanda, and L. Spainhower,
`“IBM z990 Soft Error Detection and Recovery,” IEEE Trans.
`Device and Materials Reliability, vol. 5, no. 3, pp. 419–427,
`2005.
`[15] G. C. Messenger, “Col

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket