This is a postprint version of the following published document: Entrena, Luis; López-Ongil, Celia; García-Valderas, Mario; Portela-García, Marta; Nicolaidis, Michael. (2011). Hardware Fault Injection. In: Nicolaidis, M. (ed.) *Soft Errors in Modern Electronic Systems*. (Frontiers in Electronic Testing, 41). Springer. Pp. 141-166. DOI: https://doi.org/10.1007/978-1-4419-6993-4\_6 © Springer SciencebBusiness Media, LLC 2011 # Chapter 6 **Hardware Fault Injection** Luis Entrena, Celia López-Ongil, Mario García-Valderas, Marta Portela-García, and Michael Nicolaidis 1 3 12 13 AU1 1 Electronic Technology Department, Carlos III University of Madrid, Spain. Email; { entrena, mgvalder, mportela, celia}@ing.uc3m.es 2 TIMA (CNRS, Grenoble INP, UJF), Grenoble-France, <michael.nicolaidis@imaq.fr> Hardware fault injection is the widely accepted approach to evaluate the behavior 5 of a circuit in the presence of faults. Thus, it plays a key role in the design of robust 6 circuits. This chapter presents a comprehensive review of hardware fault injection 7 techniques, including physical and logical approaches. The implementation of 8 effective fault injection systems is also analyzed. Particular emphasis is made 9 on the recently developed emulation-based techniques, which can provide large 10 flexibility along with unprecedented levels of performance. These capabilities 11 provide a way to tackle reliability evaluation of complex circuits. #### 6.1 Introduction As technology progresses into nanometric scale, the concern for reliability is 14 growing. The introduction of new materials, processes, and novel devices along 15 with increasing complexity, power, performance, and die size affect reliability 16 negatively. On the contrary, the reduction in dimensions, capacitance, and voltage 17 results in less node critical charge, bringing up the soft-error threat. Actually, taking 18 into account all these trends, the soft-error rate (SER) per bit is expected to keep 19 stable, according to recent studies [1]. However, since the memory bit count and the 20 functionality integrated in logic components are increasing rapidly, the threat of 21 soft errors is becoming a reality for many applications where it was not a concern in 22 the past. The increasing use of electronic systems in safety critical applications, 23 where human life is at stake, forces to ensure dependability and makes it an important 24 challenge today. e-mail: celia@ing.uc3m.es C.L.\_Ongil (⊠) # Author's Proof Providing quality of service in the presence of faults is the purpose of fault tolerance. But before a fault-tolerant system is deployed, it must be tested and validated. Thus, dependability evaluation plays an important role in the design of fault-tolerant circuits. Fault injection, i.e., the deliberate injection of faults into a circuit under test, is the widely accepted approach to evaluate fault tolerance. Fault injection is intended to provide information about circuit reliability covering three main goals: validate the circuit under test with respect to reliability requirements; detect weak areas that require fault-tolerance enhancements; and forecast the expected circuit behavior in the occurrence of faults. From a general point of view, we can distinguish between hardware and software fault injection, although the frontier between them is not well defined. Software fault injection deals with software reliability and will not be treated here. Hardware fault injection is related to hardware faults, which are generally modeled at lower levels (e.g., logical or electrical) and are injected into a piece of hardware. In spite of the work made over many years, hardware fault injection is still a challenging area. New types of faults and effects come to place or achieve increasing relevance. In addition to permanent stuck-at faults or transient faults affecting memory bits, such as single-event upsets (SEUs), today designers must face the possibility of timing faults, single-event transients (SETs) affecting combinational logic, and multiple bit upsets (MBUs) affecting memories. More complex circuits need to be evaluated as a consequence of technology scaling and increasing density. In particular, Systems on Chip (SoCs) include a variety of components, such as microprocessors, memories, and peripherals, which pose different fault injection requirements. The widespread use of field-programmable technology confronts the need to evaluate the effect of errors on the configuration bits. As complexity increases, the number of faults to be injected in order to achieve statistical significance also increases. Thus, there is a need for new approaches and solutions in order to accurately reproduce fault effects, increase fault injection performance, and support the variety of existing technologies and components. This chapter summarizes the current state of the art in hardware fault injection techniques and optimizations of the hardware fault injection process. It must be noted that the fault injection process is not only concerned with the means to inject faults. A complete environment is required for initialization of the circuit under test, selection and application of appropriate workloads, collection of information about faulty circuit behavior, comparison with the correct behavior, classification of fault effects, and monitoring of the overall process. The importance of each of these tasks must not be neglected, because all of them are relevant for a successful evaluation. The remaining of the chapter is organized as follows. Section 6.2 reviews the most relevant hardware fault injection techniques and the existing approaches to inject faults. Section 6.3 describes the fault injection environments. Section 6.4 describes optimizations that contribute to increase fault injection performance. Finally, Sect. 6.5 contains the conclusion of this chapter. AU2 #### 6.2 **Hardware Fault Injection Techniques** Dependability evaluation of modern VLSI circuits entails the need of injecting 71 realistic faults at internal locations, and observing in an efficient way the circuit 72 behavior in the presence of these faults. Indeed, with the generalization of deep 73 submicron technologies, there can be a much greater number of faults in digital 74 systems, and a significant proportion of them occur internally in the chip [2]. There 75 are different methods for injecting faults in integrated circuits mimicking hard or 76 soft errors. Some of these methods have been used for several years, while others 77 are being proposed in recent research works. In every fault injection method, there are some common elements that may be 79 defined at different abstractions levels. First of all, faults to be injected in the 80 evaluation process should be selected. The possibility of provoking real faults 81 within the device or just modeling the effect they cause (fault model) in the circuit 82 elements must be considered. Second, the circuit to be checked, usually named 83 device under test (DUT) or circuit under test (CUT), could be a commercial-off-theshelf (COTS), a prototype, or a design model. The level of abstraction for the DUT 85 is related to the type of fault to be injected. Third, a collection of workloads should 86 be available in order to get a representative subset of the circuit functionality. Circuit robustness should be checked when working as closely as possible to normal operation in its final application. Finally, the most important element in 89 every fault injection method is the expected result, which corresponds mainly to a 90 measure of device robustness against faults. Furthermore, any fault injection method requires a specific setup for the fault 92 injection campaign. In particular, a dedicated PCB with the DUT must be developed. Also, a system for workload application and processing of the test results, including hardware, software, communication links, etc., should be implemented 95 for results processing. Faults are typically classified according to their duration into permanent, inter- 97 mittent or transient faults. Permanent faults are related to manufacturing defects 98 and circuit aging. Transient faults are mainly caused by the environment, such as 99 cosmic radiation or electrical noise. They do not produce a permanent damage and 100 their effects are known as soft errors. Cosmic radiation is the main source of single-event effects (SEEs) in integrated 102 circuits. SEEs are caused by single energetic particles and take many forms, with permanent or transient effects. Thus, in this case, injected faults range from real faults coming from natural cosmic radiation to fault models of SEEs; circuits to be tested range from COTS to design descriptions. The result of a fault injection 106 campaign is the probability a device will fail when working. Typically, this 107 measure is the failure in time (FIT), which stands for the number of circuit failures per 10<sup>9</sup> h, and it is referred to a given radiation environment. The main hardware fault injection cases will be summarized in the following 110 subsections. Some of them are also addressed in detail in other chapters. 70 78 96 101 109 132133 134 135 136 137 138 139 141 # 112 6.2.1 Physical Fault Injection 113 Physical fault injection methods use external perturbation sources, such as natural 114 and accelerated particle radiation, laser beam, pin forcing, etc. The objective of this 115 test is the analysis of circuit robustness in the presence of faults affecting a device. 116 These methods are applied on COTS or prototypes for qualifying new technologies 117 or existing chips for a new application environment. These methods can cause a wide range of internal damages in the circuit under test: SEEs, displacement damage (DD) and total ionization dose (TID). Typically, studied SEEs are SEUs, SETs and single-event latchups (SELs). # 1 6.2.1.1 Radiation Methods The most traditional method for provoking internal soft errors in the circuit under test is the use of particle radiation. Cosmic radiation is the main source of SEEs in 123 integrated circuits. Therefore, testing a device in its real environment (space, high altitude, etc.) is the most realistic way of evaluating its sensitivity with respect 125 to SEEs. There are some practical disadvantages for this solution that are related to 126 cost and time-to-market. Due to the low probability of error, weeks or months are 127 128 generally required, as well as hundreds or thousands of samples, for obtaining valid measures. Another disadvantage is the unknown relationship between failures and 129 the energy of particles striking the samples. 130 When shorter testing times and more controlled experiment setup are required, accelerated radiation tests are used for qualifying new technologies. The DUT receives a beam of particles, coming from a specific accelerator facility [3] or from a radioactive source [2]. In this case, few samples are needed (typically in the order of ten) as well as less time for testing (hours or days). Also, an energy or intensity sweep can be applied on the particle beam affecting the circuit under test. In this case, it is easy to know the SER with respect to the energy of particles. In accelerated radiation tests, several types of particles are used for evaluating circuit robustness in harsh environments. Particles coming from cosmic radiation (primary or secondary radiation) are heavy ions, protons, and neutrons. Also, shells of devices emit alpha particles that provoke SEEs in the circuit. The main origin of cosmic radiation is the sun [4]; it provokes ionizing particles (heavy ions and protons), known as primary radiation, in deep space and stratospheric orbits, and non-ionizing particles (neutrons), known as secondary radiation, in atmospheric applications. The IEEE standard for testing space-borne components [5] indicates the type of particles present in different environments. This information is used together with the final location of the circuit for stating the work environment and deciding the type of radiation test to be performed. When checking the dependability of circuits in aircrafts or in earth surface, neutron and alpha particles are used for accelerated radiation tests [6]. Alpha particles affect circuits, especially at earth surface. On the contrary, heavy ions are used for testing circuits working in nuclear or spatial applications [7]. 152 Finally, protons are employed for testing components of terrestrial satellites 153 [8, 9]. Standards developed by space agencies [8, 9] or by JEDEC association [6, 7] 155 regulate radiation test procedures. In any case, simulation software tools are used 156 together with these tests in order to obtain a more accurate knowledge of damage effect in devices. Material, type of particle, orbit, etc., are key elements in these calculations. In these experiments, the setup task is a heavy process. Flux, fluence, and energy of particles must be set accurately (dosimetry is a key factor) for achieving a 161 significant number of events and avoiding TID damage. The final result obtained 162 is the cross-section, which is a function of particle energy or linear energy transfer 163 (LET) and gives the number of events detected with respect to the particle fluence 164 applied. Static and dynamic tests should be performed on the DUT. While static tests 166 qualify the technology of a device, dynamic tests measure the robustness of a circuit 167 running in that device with a given workload. It is possible to disaggregate both 168 tests and avoid dynamic test under radiation. Velazco et al. [10] proved that 169 dynamic test results can be obtained by combining static test results (static cross-170 section) and results obtained with another fault injection method. The main disadvantages of fault injection based on accelerated radiation ground 172 testing are the high cost of the test campaigns and the relative small number of 173 events achieved per run, which may lead to results that are not statistically signifi- 174 cant. Also, controllability and observability are very limited. In any case, this type 175 of test is currently mandatory for qualifying a technology in aerospace applications. 176 # 6.2.1.2 Laser Methods A relatively recent approach for injecting faults from an external source is the use of 178 laser beams. Laser incidence in the internal elements of the circuit causes effects 179 similar to the ones provoked by particles issued from cosmic radiation. Indeed, this method is associated with bit-flip fault model for SEU effects. It is able to inject 181 faults in a very accurate way, with the help of a microscope and a laser beam spot 182 control. There are research works [11-13] that prove the correlation between the results 184 obtained from accelerated radiation test and laser test. Although there is a 185 slight difference between the particle-material interaction and the photon-material 186 interaction, SERs obtained from laser beam exposure are commonly accepted 187 nowadays [13]. Laser test provides a high level of accessibility to locate the circuit elements 189 where faults are injected. Also, this method implies less expensive equipment than 190 radiation ground test facilities and less complex experiment setup (e.g., it is not 191 necessary to separate the DUT in another PCB). 177 183 188 192 154 159 160 165 # 193 **6.2.1.3** Pin Forcing Another solution for fault injection from external sources is pin forcing [14, 15]. It was proposed for testing relatively simple ICs. Some authors considered that forcing values at input/output pins of a device could provoke the same effect as SEEs in very simple circuits. There are several CAD tools developed for helping designers to execute fault injection campaigns, such as RIFLE [16], SCIFI [17], FIST [18], or Messaline [19]. Considering current complexities in ICs, this method is very limited. Currently, it is employed for testing other external aspects of reliability (vibrations, electrical noise, etc.), but it is not intended for SEU fault injection. Although it is a really cheap solution, possible circuit damage due to values forced onto device pins, together with the poor controllability and observability provided in the increasingly complex ICs, makes this method unattractive for dependability analysis of current technologies. Physical fault injection methods provide realistic measures of SERs, but they are very expensive. They are considered the best methods available for qualifying new technologies. Nevertheless, better solutions are required for testing circuits during the design process where re-design is possible and cheaper. Furthermore, very complex designs require testing large amounts of faults in order to obtain statistically valid results. When thinking of large fault sets for current designs, fault injection can be accomplished at higher abstraction levels. # 213 6.2.2 Logical Fault Injection Logical fault injection methods use logic resources of the circuit to access internal elements and insert the effect a fault provokes (fault model). These extra logic resources are originally intended for other purposes, such as the IEEE standard for Boundary Scan 1149.1 (JTAG) that provides an easy way for accessing internal scan path chains through a serial interface. Also, some commercial microprocessors include on-chip debugging (OCD) capabilities that enable access to internal memory elements (program counter, user registers, etc.). Finally, reconfiguration resources for programmable devices enable to control and observe internal configuration nodes and, therefore, injecting faults and observing their effects. The undertaken fault models depend on the robustness analysis under execution. Therefore, bit-flip model is applied for SEUs and MBUs, stuck-at model is applied for permanent faults, voltage pulses are used for SETs, etc. # 226 6.2.2.1 Software Implemented Fault Injection SWIFI is intended for testing hardware by means of executing specific software that modifies internal memory elements (user accessible) according to a fault model. # Author's Proof ### 6 Hardware Fault Injection Fault injection can be performed during compilation time or during execution time 229 [20]. In this last group, typical approaches use timers, such as FERRARI tool [21], 230 or interruption routines, such as XCEPTION tool [22] or CEU tool [10]. More 231 recently, new solutions have been presented [23] combining software-based tech- 232 niques with previous approaches. 233 #### **On-Chip Debugging for Microprocessors** 6.2.2.2 Debugging resources provide direct access to internal registers, program counter, 235 and other key elements in microprocessor architectures. This access can enable 236 fault injection and fault effects observation in a rapid and effective way. Further- 237 more, the external accessibility of these capabilities makes the automation of fault 238 injection campaigns easier. The use of OCD resources for testing purposes has 239 been studied by some authors in recent years. FIMBUL tool uses JTAG interface 240 for injecting bit-flip faults into memory elements of a microprocessor [17]. 241 Rebaudengo et al. use the Motorola OCD, named background debugging mode 242 (BDM), to execute fault injection through a serial port [24]. Also, NEXUS 243 AU3 debugging standard is being used to enhance this fault injection method [25]. In 244 [26] and [27], solutions are proposed by implementing specific hardware modules 245 for interfacing between DUT (microprocessor) and host machine. Recently, Portela et al. [28] have proposed another enhancement in the use of 247 OCD capabilities, implementing in a hardware module the host in charge of 248 injecting faults and analyzing obtained results. By reducing communication delays 249 between hardware and software, fault injection process can be easily accelerated 250 and automated. # **Reconfiguration Resources** Reconfiguration resources in programmable devices make possible a direct fault 253 injection within memory elements in prototyped designs. This method is widely used to evaluate the effect of faults in the configuration memory of FPGAs (field 255 programmable gate arrays), which is a very important issue in these devices. Partial reconfiguration reduces the time needed for performing fault injection in 257 the configuration memory of FPGAs. Ref. [29] presents a solution based on 258 reconfiguring by means of the Xilinx software JBits. In [30], a tool for injecting 259 SEU faults in a Virtex® FPGA is proposed. This tool is able to inject faults in 260 programmable interconnections, which are not accessible through commercial software tools (JBits). In recent contributions, Alderighi et al. proposed the FLIPPER 262 fault injection platform which enables the fault-tolerance evaluation of hardened 263 prototypes in FPGAs [31]. 246 251 252 256 264 # 65 6.2.3 Logical Fault Injection by Circuit Emulation As FPGA-based prototyping becomes popular for ASIC verification, it can also be exploited for hardware fault injection. In this case, the circuit under test is proto-typed in one or several FPGAs. This approach is generally known as emulation-based fault injection. Contrary to the approaches mentioned in the previous section, FPGAs are used here just as a means to support fault injection, and the final circuit is to be implemented in some ASIC technology. Fault injection in an FPGA-based prototype can take advantage from the flexibility of field-programmable hardware. Fault injection requires high controllability of each circuit node in order to modify its logic state. This can be obtained by using the FPGA reconfiguration mechanisms to modify the circuit or the contents of accessible memory elements. Another approach consists in inserting some additional hardware blocks in the prototype to support fault injection. These hardware blocks are called *instruments*. Emulation-based fault injection was originally developed for permanent faults. In [32], a fault injection method is proposed for stuck-at faults. This method consists in modifying the circuit by connecting a signal to a constant logic value. Therefore, the FPGA must be resynthesized and reconfigured for each fault. Several techniques to emulate faults in parallel are proposed to alleviate the resynthesis and reconfiguration effort. A fault injection technique for SEUs (bit-flips) based on run-time reconfiguration of the FPGA is proposed in [33]. In this technique, flip-flop (FF) contents are modified by controlling the asynchronous set/reset of each FF through the FPGA configuration bitstream. Injection of a fault is performed with the following steps: (a) at injection time, read the states of FFs; (b) reconfigure FPGA to set the asynchronous set/reset switch of each FF as to keep the current state, except for the faulty FF, that will be set in the opposite way; (c) pulse global set/reset line (state of faulty FF is modified); and (d) reconfigure FPGA again to set the asynchronous set/reset switches to the original value. The first step is performed by readback of the configuration bitstream, which includes the states of FFs. Readback is also used to check the results of each fault injection experiment and classify fault effects. This idea is also followed up in the FT\_UNSHADES platform [34]. Note that this approach requires reconfiguring twice the FPGA for each fault. Although partial reconfiguration can be used, the reconfiguration process is slow. Fault injection rates range between 0.1 s and more than 1 s per fault, depending on the length of the partial reconfiguration bitstream. Circuit instrumentation is a means to overcome the limitations of FPGA reconfiguration. It consists in inserting some pieces of hardware or instruments that can provide external controllability and observability to inject a fault and observe its effects. Then, the circuit is prototyped in an FPGA including the instruments. It is important that circuit instrumentation can be automated in order to avoid handling the circuit and to make the instrumentation process effective. On the contrary, the # Author's Proof ### 6 Hardware Fault Injection instruments should be small enough to obtain an acceptable overhead in the 308 prototype. In one of the earlier works [35], Hong et al. proposed a technique to avoid 310 reconfiguring for each fault by adding some specialized hardware blocks. Each 311 block is attached to a target node and contains a flip-flop that stores the injection 312 signal value. These flip-flops are arranged in a chain, so that faults can be injected 313 by shifting in the desired injection values in the chain. A circuit instrumentation technique for the injection of non-permanent faults is 315 proposed in [36]. This technique is intended to emulate SEUs by injecting faults in 316 the circuit flip-flops. For this purpose, the circuit under test is modified by substituting each flip-flop by the instrument shown in Fig. 6.1. This instrument contains 318 an additional flip-flop, called the mask flip-flop, and two gates that implement the 319 mask logic. The mask flip-flop is used to select the fault injection target. At 320 injection time, the inject signal is asserted to all instruments. Then, the input 321 value at the nodes where the mask flip-flop is set will flip and the fault is injected. 322 More elaborated instruments have been proposed by Lopez et al. [37] that will be 323 described in Sect. 6.4. The mask flip-flops are arranged in a scan chain that can be loaded serially. Once 325 the mask scan chain is loaded, the inject signal is asserted at the required time to 326 inject a fault. Faults can be injected at different nodes by shifting in the mask scan 327 chain. MBUs are supported by setting more than 1 bit in the mask. Eventually, the 328 contents of the functional flip-flops can be loaded into the mask flip-flops. This 329 operation captures the internal state of the circuit, which can be observed by shifting out the mask chain. 324 331 The circuit instrumentation technique is very efficient, as it does not require 332 reconfiguring the circuit for every fault. Also, setting the fault injection mask is 333 much faster than FPGA reconfiguration. In addition, latent faults can be detected by checking the circuit state as obtained through the scan chain after the fault injection 335 process. Experimental results [36] show a fault injection rate of 10,000 faults/s by 336 **Fig. 6.1** Fault injection instrument in [36] using a 20-MHz clock and a short workload (100 test vectors). For large workloads, the fault injection rate is inversely proportional to the workload length. Injecting and propagating SETs is much more difficult, since it requires prototyping the logic delays of the circuit under test. Synthesizing the circuit under test for an FPGA would produce an equivalent functional circuit model, but with different gate delays, Existing approaches for SET emulation are based on embedding timing information in some way, such as the topology of the circuit [38]. Recently, an efficient approach has been proposed using a quantized delay model [39]. In this model, gate delays are rounded to a multiple of a small amount of time, addressed as time quantum. Quantized delays can be implemented in an FPGA using shift registers where the time quantum corresponds to a clock cycle. Experimental results show a fault injection rate in excess of one million faults/s, representing an improvement of three orders of magnitude with respect to a simulation-based approach. The flexibility provided by FPGAs can be used to support some fault injection functions. For example, the FPGA prototype can include two instances of the circuit under test that are dedicated, respectively, to prototype the golden (fault-free) and the faulty circuit. Both instances run in parallel and the outputs can be compared inside the FPGA at the end of the execution to detect failures [34, 40]. A refined solution consists in duplicating just the sequential elements, sharing the combinational logic [37]. In this case, the golden and faulty instances run in alternate clock cycles. Most of the work on emulation-based hardware fault injection focuses on general-purpose logic, but circuits may also include embedded memories. As the number of storage elements in memories is generally very large, memories are in fact very relevant. However, the controllability and observability are limited to a memory access per memory port and clock cycle. Memories can be emulated by forcing the synthesizer to treat them as flip-flops. This approach would usually produce emulation circuits much larger than commercially available FPGAs. Therefore, embedded memories must be instrumented in a particular way to support fault injection. A memory can be implemented in an FPGA using FPGA memory blocks. In this case, fault injection is performed by instrumenting the memory buses [28]. In [40], memories are implemented using dual-port FPGA memory blocks, where one of the ports is specifically devoted to fault injection. At the fault injection time, the target memory position is read, XORed with the fault mask, and then written back. A monitor circuit is included to clear all the internal memory after an experiment, in order to avoid accumulation of errors, and to read serially all the internal memory in order to compare it and detect faults. These two operations take a lot of time, just in proportion to memory size, and must be performed for every fault injection experiment. Thus, injecting a fault in a memory position can be achieved by instrumenting the memory buses, but the initialization of the memory and the extraction and comparison of the fault injection results for analysis are the major problems of memory fault injection. AU4 # **Fault Injection System** A hardware fault injection system is able to execute a circuit with a workload in the 382 presence of faults, and compare the faulty behavior with the fault-free behavior. A fault injection system is typically composed of the following elements: 384 | • The CUT | 385 | |-----------------------------------------------------------------------------------|-----| | <ul> <li>A fault injection mechanism, which can be physical or logical</li> </ul> | 386 | | • A test environment, in charge of the following tasks: | 387 | | <ul> <li>Supply the vectors required for the workload</li> </ul> | 388 | | <ul> <li>Check the effect of faults in the CUT</li> </ul> | 389 | | <ul> <li>Collect results</li> </ul> | 390 | | <ul> <li>Classify faults</li> </ul> | 391 | | <ul> <li>Control the whole process</li> </ul> | 392 | | | | Fault injection systems are used to perform fault injection campaigns, which are 393 experiments intended for obtaining a measurement of the circuit reliability. Some 394 examples of this measurement are the fault dictionary, the circuit cross-section, the 395 SER, the mean time between failures, etc. Fault injection systems using physical fault injection require a prototype of the 397 DUT, which is exposed to the fault provoking element (radiation, laser). 398 The system has to be built in such a way that the DUT is correctly exposed, but 399 the rest of the system is not affected by the fault source. In accelerated radiation 400 experiments, the CUT must be separated from the rest of the system, in order to 401 receive the beam without affecting the test environment. For example, the THESIC 402 system [41] consists of a motherboard as test environment, and a mezzanine board 403 for the CUT. For laser campaign, the circuit package must be removed for the laser 404 beam to be effective, and the CUT must also be visible through a microscope to 405 locate the laser incidence point [42]. Systems using logical fault injection have a similar structure but use a different 407 fault injection mechanism, so there are no restrictions related to physical exposure 408 aspects. The only additional part to add is a fault injection method, for example, a 409 host-controlled JTAG interface connected to the OCD of the CUT. The rest of this section will cover the implementation of systems using logical 411 fault injection by circuit emulation. This kind of system uses FPGA-based proto- 412 typing to implement the CUT. As this is the most general and flexible scenario, 413 there is a large variety of solutions. Building these systems represent a very 414 challenging task as far as the process performance is concerned. 415 There is a wide range of possibilities to build emulation-based fault injection 416 systems, as there are many tasks that can be executed in a host computer or in the 417 hardware. Fig. 6.2 shows the main required tasks. The user interaction takes place at 418 the host computer and the circuit emulation is performed in the hardware (circuit 419 core). The rest of the tasks may be executed either by the host computer or by the 420 hardware. 421 381 396 Fig. 6.2 Emulation-based fault injection system components There are a lot of intermediate possibilities, depending on the tasks assigned to the host computer and to the hardware. The speed of the communication channel is critical when considering the amount of information to transfer. In general, the tasks required to perform a fault injection campaign comprise fault list management, workload application, fault injection, fault classification and result analysis. These tasks are analyzed in the following paragraphs. # 428 **6.3.1** Workload A workload must be provided to the circuit under evaluation for execution. The implementation of this task must consider a trade-off between flexibility, performance, and resource usage. Several approaches can be considered. Test vectors can be generated at the host computer and sent to the hardware when they are going to be applied. This method is very flexible, as the workload can be changed very easily, but implies a continuous host–hardware communication which slows down the execution. On the contrary, a stimulus generation block can be implemented in the hardware, next to the circuit core, so it can supply vectors at the speed they are required. A simple approach is to use some BIST-like vector generation circuit, like an LFSR. This kind of implementation is very fast and uses very little resources, but the workload obtained may not be very representative. An intermediate solution is to store test vectors in the FPGA memory. This solution is flexible, as the workload can be easily changed by downloading a new one or by reconfiguring the FPGA, and it is also fast, because test vectors are fed to 443 the circuit core by hardware. However, FPGA devices usually have a limited 444 amount of internal memory, representing a drawback for this method. In order to 445 improve resource usage, test vectors may be compressed, or external memory may 446 be used if it is available on the FPGA board. 447 6.3.2 Fault List 448 The fault list management task has some important aspects, both in design and 449 implementation. The ideal case is to generate a fault list including faults at every 450 location and every time instant, for a given workload. Considering bit-flip fault 451 model for SEUs, the complete single fault list would include a fault in every 452 memory element and every clock cycle of the workload. This approach is practical 453 only if the circuit is either very small or the fault injection system is very efficient, 454 like Autonomous Emulation [37]. 455 470 474 476 Usually, the system is not so efficient to perform a fault injection campaign with 456 all possible faults in a reasonable time. In these cases, the fault space must be 457 sampled to obtain a statistically representative subset. There are several approaches 458 to create the fault list. The simplest one is to use random fault list generation, both 459 in fault localization and time instant, although it must be taken into account that a 460 computer or a hardware generated list is not really random, but pseudo-random. 461 Other proposals include the use of Poisson distribution for the generation of the 462 time instants, in order to reproduce the results of a radiation experiment. A deeper 463 discussion on these aspects can be found in [43]. 464 Regarding implementation, the fault list can be generated at the host computer or 465 in the hardware. If it is generated at the host computer, it must then be transferred to 466 the hardware. It is advisable to implement an intermediate storage mechanism, so 467 that the emulator does not need to wait for the new fault when it has already finished 468 processing the previous one. This mechanism can make use of internal FPGA 469 memory or on-board memory. In this case, the impact in performance is not as high as for the workload case. 471 For the workload, a test vector is required every clock cycle, but a new fault is 472 required only when the previous one has already been processed. 473 ### 6.3.3 Fault Classification The classification of a fault can be made out of the comparison of the faulty and the 475 golden executions. If a fault is injected and after some time it produces a result different than 477 expected, it is called a *failure*. If the fault effect completely disappears from the 478 circuit after some execution time, the fault is called *silent*. If the faulty circuit shows 479 differences with the golden one after the execution of the workload, but it did not produce any error in the results, the fault is called *latent*. If the circuit has some built-in fault detection mechanism, the faults can also be classified as *detected* or not detected. In microprocessor-based circuits, a fault could also be classified as *lost* of sequence, when the effect of the fault is modifying the normal instruction sequence, preventing the circuit to reach the end of the workload. Concerning failures, the condition of producing an erroneous result is different for every case. For a control circuit, an error can be a value in the outputs that is different than expected. For an algorithm processor, the calculation results must be checked and they can be written at the circuit outputs or stored in memory. Failures could also be sub-classified by a criticality-based criteria. For example, some errors could produce physical damage in the system (e.g., an electronically controlled mechanical engine) or produce a dangerous situation (a brake system), while others may be irrelevant (a wrong pixel in an image). Latent faults represent an additional problem. The system must have a mechanism to compare the golden and the faulty circuit states to decide if the fault is still present. For example, two instances of the circuit can be implemented, and a mechanism to compare them must be included. If partial reconfiguration is used, readback of the circuit flip-flops can be used for this purpose with a high penalty on performance [34]. With instrumented circuit technique, flip-flops are duplicated and compared [37], so that latent faults can be detected online. Performance will profit from an early fault classification mechanism. Being able to stop the execution immediately after the fault is classified will allow saving some time. If the classification is made in hardware, it is easier to implement a fast classification mechanism. This aspect will be explained in more detail in the next section. # 506 6.3.4 Result Analysis The results of a fault injection campaign can be expressed in several ways. In terms of circuit qualification, it is usually required to obtain a single figure for the circuit reliability, like the SER, expressed by either number of FIT, or Mean Time Between Failures (MTBF). These figures are calculated using information from fault injection campaigns and taking into account the environment where the circuit will operate. Emulation-based fault injection campaigns are useful to obtain information about the consequences of faults. Information about fault occurrence probabilities and other aspects must be obtained using other methods. The most complete result information that can be obtained from a fault injection campaign is the fault dictionary, which is the list that holds the classification of every injected fault. The complete fault dictionary is very useful to locate weak areas in the circuit or critical tasks in the workload. The implementation problems for this task are quite similar to those of the fault list. A new result is generated for every processed fault. Results can be transferred AU5 immediately after processing to the host, or they can be temporally stored in the 521 hardware to improve performance. In case result generation is very fast or there is no temporal storage available, 523 statistical measures may be collected in the hardware. For example, results can be 524 classified per location, or per injection instant, or just percentages can be calculated. 525 #### **Communication** 6.3.5 Emulator-host communication has a great impact on the system performance. In 527 order to improve the performance, we can either increase the speed of the communication channel or decrease the amount or the frequency of the transmitted data. Commonly used communication mechanisms are serial ports, USB, Ethernet, or 530 PCI. Obviously, an increment in the channel speed will result in an overall speed 532 improvement, but several aspects must be taken into account. For example, communication channels like USB can be very fast, but only in burst mode, transmitting big amounts of data in a single pack. In the case of a fault injection system, the information to transmit (test vectors, 536 fault list, fault dictionary, and control commands) can be sparse in time. This 537 approach is not very efficient for high speed communication channels, so it is 538 advisable to design the emulator, including data compaction or communication 539 buffers. In order to obtain the maximum performance, the objective is to maintain 540 the core emulation circuit running as much time as possible, and avoid the time gaps 541 due to communication. ### **Fault Injection Optimizations** 6.4 Emulation-based fault injection techniques have the capability of notably speeding 544 up the fault-tolerance evaluation process regarding other methods. Injecting 545 millions of faults in a few hours is possible using these techniques. However, 546 reducing this time is a very interesting goal, since fault-tolerance evaluation is a 547 task performed many times during the circuit development. Furthermore, current 548 circuits have large complexity, including an increasing count of sensitive areas that 549 can be affected by faults. Therefore, the higher the number of possible faults, the 550 higher the number of faults that must be injected to obtain a significant measure- 551 ment of the circuit robustness. Definitely, speeding up the fault-tolerance evaluation 552 process is required. Several approaches exist to speed up the fault injection process using emulationbased techniques. In the following sections, we will describe the main sources of 555 fault injection inefficiency and solutions to overcome them. 526 531 535 522 543 553 556 Autonomous Emulation Author's Proof 6.4.1 567 568 569 570 577 578 579 580 581 582 583 584 Emulation-based techniques profit from the capability of an FPGA to emulate 558 circuit behavior at hardware speed. Using typical emulation-based fault injection 559 solutions, the emulation process is interrupted every time the emulator needs to wait 560 for the host to apply the stimuli, to inject a fault, or to check the output values. Then, a very intensive interaction is required between the emulator and the host computer. 562 The host controls the injection and evaluation of every fault. This introduces a 563 performance bottleneck due to the communication between the emulator and the 564 host computer, which prevents taking full advantage of the FPGA capabilities for 565 fast hardware emulation. 566 Autonomous emulation [37] is a fault injection solution aimed at avoiding the intensive communication between host and emulation platform. It consists in implementing the whole injection system in the FPGA by making use of the instrumentation mechanism to insert faults in the circuit under test (Fig. 6.4). The FPGA is in charge of performing the following tasks: - 1. Managing the whole fault injection process. - 573 2. Applying the input stimuli to the circuit under test. - 3. Activating the fault injection. - 575 4. Watching the circuit behavior under faults and classifying the injected fault depending on its effect on the circuit functionality. Using autonomous emulation, access to any circuit-sensitive element is simple and straightforward, and the required time necessary to perform the different injection tasks can be significantly reduced. Figure 6.3 shows the typical emulation-based solutions scheme and Fig. 6.4 the autonomous emulation scheme with the purpose of illustrating the differences between both systems. The autonomous emulation system benefits from the available resources in current FPGA platforms, like memory blocks, in order to implement more tasks close to the circuit under test, which minimizes the required interaction with the host computer. The enhancements provided by the autonomous emulation solution are as follows: Fig. 6.3 Typical scheme for an emulation-based fault injection system Fig. 6.4 Autonomous Emulation scheme - The required communication between host computer and the emulation platform 586 is minimized, being established only twice, at the beginning of the evaluation 587 process to configure the FPGA from the PC, and at the end of the fault injection 588 campaign to collect the obtained results, that is, to download the fault dictionary. 589 - Observability and controllability are significantly enhanced, since access to the 590 memory elements does not require a particular communication channel and then 591 it is straightforward and easier. In general, the typical emulation-based techni- 592 ques set a trade-off between the process speed and the observability of the 593 internal circuit resources, because higher observability requires more informa- 594 tion exchange and, therefore, more time to spend in the evaluation process. The 595 autonomous emulation system provides a high observability without penalty in 596 the injection process speed, since the injection system and the circuit under test 597 are implemented in the same device. - Hardware implementation makes the parallel execution of different injection 599 tasks and it speeds up the whole process with respect to a software implementation. Once the PC-FPGA communication, that is, the main limitation in fault emulation techniques, has been minimized, the fault injection process is much more 603 efficient. Moreover, the access to the circuit internal resources does not require 604 exchange of information between the host computer and the FPGA. This feature 605 allows the application of new optimizations to reduce the time spent per fault. #### 6.4.2 Fault Evaluation Process The fault injection process can be optimized by applying techniques to reduce the 608 time spent in the different steps needed to evaluate the consequences of a fault. For 609 a given workload these steps are the following (Fig. 6.5): 610 607 606 598 Fig. 6.5 Time spent to emulate each fault - Reach the circuit state corresponding to the injection instant. The most common way to do it consists of running the workload from the beginning until the injection instant. - 614 2. Inject the fault. - 615 3. Classify the fault according to its effect on the circuit behavior. For this purpose, 616 the circuit under test resumes the workload execution since the injection instant 617 until the fault is classified or the workload finishes. In the worst case, emulating the complete workload for each fault can be 618 required. A fault injection campaign with a large number of injected faults and 619 long workloads may involve excessive time to complete the evaluation process. With the instrumentation-based mechanism, the fault injection task takes just one clock cycle and so possible time optimization should be applied to the other steps 622 (1 and 3). The time required for reaching the circuit state at the injection instant can 623 be optimized by applying techniques to save fault-free emulation time. A solution 624 consists in doing a previous storage of the circuit state and a posterior reloading of 625 this state in the next fault injection (state restoration). Regarding fault classification, techniques can be applied to speed up fault emulation by aborting execution as soon 627 as the fault can be classified, profiting from the higher observability available in an 628 Autonomous Emulation system. In the following sections, these optimizations are 629 detailed for SEU faults. 630 # 631 6.4.3 State Restoration 632 The circuit under test can get to the state corresponding to the injection instant in 633 two different ways: - Emulating the workload until the injection instant is reached. - Storing the required state in memory elements of the circuit and restoring it just before the fault injection instant (state restoration). In this case, additional hardware to store the corresponding state is necessary. State restoration avoids the fault-free circuit emulation for every injected fault. Required states are easily obtained from the golden execution, run just once. Fig. 6.6 Possible instrument to support injection state restoration in one clock cycle The obtained benefit will depend on the time required to perform the restoration 640 and, therefore, on the technique used to implement this optimization. Figure 6.6 641 presents a possible scheme to replace every original flip-flop in order to support the 642 state restoration in just one clock cycle. It includes an additional flip-flop that 643 contains the injection state to be restored in the circuit when the fault is going to 644 be emulated. Let us suppose that the state restoration requires only one clock cycle, the 646 workload consists of C clock cycles, and the circuit under test contains F sensitive 647 memory elements. Considering that all the memory elements have the same 648 probability to be affected by a fault in any workload cycle, the number of possible 649 single faults is $F \cdot C$ . In the worst case, with no optimizations, C clock cycles are 650 necessary to emulate each fault. Taking into account all the possible single faults, 651 the total time spent during the fault injection campaign in emulating the circuit 652 without faults is $(1+2+3+\cdots+C-1)$ clock cycles. When the state restoration 653 is performed in just one cycle, this optimization avoids the emulation of $C_{\rm S}$ clock 654 cycles, where 655 $$C_{\rm S} = F \frac{C(C-1)}{2}$$ For example, for a circuit with $F = 10^3$ , a workload with $C = 10^5$ clock cycles, 656 and $C_{\rm S} \sim 0.5 \times 10^{13}$ clock cycles, the total saved time at 100 MHz would be 14 h. 657 # Early Fault Classification Emulation of a fault finishes when the fault is classified or when the end of the 659 workload is reached. A typical fault classification consists in considering three 660 658 categories: failure, latent, and silent faults. In order to detect a failure, the faulty circuit outputs must be compared with the golden behavior. To distinguish between silent and latent faults, internal resources must be observed. Reducing the time to classify faults introduces an important optimization in the evaluation process speed. In general, fault injection techniques stop the fault evaluation as soon as a failure is detected, since outputs observation and comparison with expected values are usually straightforward. Due to the limited observability of the internal resources, in most of the hardware fault injection techniques, classifying a fault as either silent or latent is not feasible or, otherwise the classification is performed at the end of the workload execution, which is very time-consuming. However, silent faults can be detected as soon as the fault effect disappears if the internal elements are observed continuously. AU6 Speeding up silent fault classification requires access to every memory element within the circuit under test in a fast and continuous way, comparing their content with the golden circuit state. Additional hardware is used to store the golden state and to perform the comparison. This extra hardware is shown in Fig. 6.7 and consists of two flip-flops to run the golden and the faulty execution at every workload instant. This optimization is possible in an Autonomous Emulation system with a low cost, since the complete system is implemented in the same hardware device. Early silent fault classification enhances the fault injection process speed, especially in circuits with fault-tolerant structures that correct or mask faults, where the percentage of silent faults is high. Therefore, applying both optimizations, state restoration (described in the previous section) and early silent fault classification, the time spent in emulating one fault can be drastically reduced (Fig. 6.8). Putting it all together, [37] describes an instrument to replace every original flip-flop that supports Autonomous Emulation, state restoration, and early silent fault classification. Such instrument is shown in Fig. 6.9. In this case, combinational logic is shared, avoiding the duplication of the complete circuit. Then, the faulty and golden emulation are executed alternately. This implementation for an Autonomous Emulation system is named *Time-Multiplexed* technique. **Fig. 6.7** Additional hardware required to implement early fault classification Fig. 6.8 Optimized fault emulation Fig. 6.9 Hardware logic to support all optimizations presented in [37] For failure and silent faults, the time elapsed between fault injection and fault 693 classification is usually a few clock cycles in circuits with fault-tolerant mechanisms. Only faults with long latencies require the execution of most of the workload, 695 in case of latent faults until the end of the workload. Therefore, if fault latencies and 696 the number of latent faults are small, which is the usual case in a well-defined 697 experiment, the reduction in execution time is proportional to the workload length. 698 Experimental results reported in [37] show that using the described optimiza- 699 tions, the fault-tolerance evaluation process can achieve fault injection rates in the 700 order of one million faults per second. #### **Embedded Memories** 6.4.5 Embedded memories are common components in modern digital circuits. SRAM 703 memories are sensitive to SEU in the same manner as flip-flops, and then, fault 704 702 718 719 720 721 722 723 724 725 726 727 728 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 injection must be performed not only to evaluate flip-flops but also to evaluate effects in the circuit behavior when an SEU affects a memory cell. 707 Fault injection in embedded memories using emulation-based techniques is a complex task due to the limited observability of the memory cells, since only a 708 memory word can be accessed in a clock cycle. For example, in order to know if 709 there is an error in a memory, the complete memory content should be read, which is very time-consuming. Few works have been published about emulation-based fault injection in circuits with embedded memories [28, 40, 44, 45]. Civera et al. [44] proposed the fault injection in memory controlling the control and data memory buses to insert a fault in a given memory bit, but it does not propose a solution to analyze faults inside the memory. Lima et al. [40] presents a memory model that consists of a dual-port memory; one port is used to perform the golden emulation, while the other port is used to inject faults. However, the result analysis is very time-consuming since it consists in reading every memory word, comparing obtained data to expected ones. Portela-García et al. [28] describes an Autonomous Emulation system with optimizations in a circuit with embedded memories. Autonomous Emulation speeds up the injection process by minimizing PC-FPGA communication requirements and including optimizations previously described (state restoration and early silent fault classification). In order to apply the Autonomous Emulation concept in complex circuits, solving the fault injection in embedded memories in a fast and costeffective way is mandatory. Autonomous Emulation solution is based in an instrumentation mechanism, so a memory instrument is necessary. It is assumed that embedded memories are synchronous, i.e., they can be prototyped in current FPGAs using the available block RAMs components, and they do not contain useful information before starting the execution. The objective is to achieve a memory model that supports state restoration and early silent fault classification [45]. The proposed solution is based on controlling and observing memory access buses (address, data, and control signals). On the one hand, fault classification requires distinguishing between silent and latent faults. For this purpose, input data bus and control access signals (like write enable, output enable, etc.) in golden and faulty execution are compared (Fig. 6.10). The emulation controller detects the insertion of faulty data in the memory and checks if fault effect is cancelled by writing the correct data during workload execution. As soon as the fault disappears, it is classified as silent. On the other hand, fault injection to emulate an SEU in a memory cell is performed only in read data, since faults in other memory words do not affect the circuit behavior and they would be classified as latent faults. Therefore, the number of faults to evaluate is reduced significantly. A possible implementation of the faulty memory in this model consists in storing just the faulty memory words. In order to access Faulty Memory in a fast way, spending just one clock cycle, it is implemented using a Content Addressable Memory (CAM) [45]. Therefore, the faulty memory contains the addresses that 749 store a fault and the faulty data itself. If a given address is stored in the faulty Fig. 6.10 Memory instrument to support Autonomous Emulation in complex circuits with embedded memories memory, the corresponding memory word contains a fault. Otherwise, the data are 750 stored only in golden memory. This implementation is named Error Content 751 Addressable Memory (ECAM), see Fig. 6.11. ECAM implementation is very 752 suitable to perform the required Autonomous Emulation tasks, such as state restoration or silent fault classification (if faulty memory is empty). The size of an ECAM implementation fixes the maximum number of errors 755 that can be considered. In practice, the probability that N faults in memory are 756 cancelled (writing a correct value) is very low for just a small value of N. Therefore, 757 # L. Entrena et al. 758 this solution implies less area overhead than other possible implementations 759 (like memory duplication). Portela-García et al. [28] and Valderas et al. [46] present experimental results on a LEON2 microprocessor. The experiments consist in injecting millions of faults in flip-flops and memories by means of an Autonomous Emulation system. # 763 6.5 Conclusions circuits. As hardware reliability is becoming an increasing concern in many application areas, there is a need for new approaches and solutions that can deal with more complex circuits and reproduce fault effects accurately and efficiently. Hardware fault injection methods have significantly evolved in the last years. Among the physical fault injection methods, accelerated radiation tests are the most used, but laser fault injection has gone through substantial developments. On the contrary, FPGAs can support very efficient logical fault injection methods, such as Autonomous Emulation. These methods can provide unprecedented levels of per- formance and fault injection capabilities, and represent suitable fault injection Hardware fault injection plays a key role in the design and validation of robust # 775 References 774 mechanisms to complement physical methods. - R. C. Baumann, "Radiation-Induced Soft Errors in Advanced Semiconductor Technologies", IEEE Transactions on Device and Materials Reliability, Vol. 5, No. 3, pp. 305–316, September 2005. - J. Karlsson, P. Liden, P. Dalgren, R. Johansson, U. Gunnelfo, "Using Heavy-Ion Radiation to Validate Fault Handling Mechanisms", IEEE Micro, pp. 8–23, February 1994. - S. Duzellier, G. Berger, "Test Facilities for SEE and Dose Testing", Radiation Effects on Embedded Systems. Springer 2007. The Netherlands. pp. 201–232. - 4. R. Ecoffet, "In-Flight Anomalies on Electronic Devices", Radiation Effects on Embedded Systems. Springer 2007. The Netherlands. pp. 31–68. - 785 5. IEEE Standard for Environmental Specifications for Spaceborne Computer Modules, 786 March 1997. - 787 6. JEDEC Standard JESD89A, "Measurement and Reporting of Alpha Particle and Terrestrial 788 Cosmic Ray-Induced Soft Errors in Semiconductor Devices", October 2006. - 7. JEDEC Standard JESD57, "Test Procedures for the Measurement of Single-Event Effects in Semiconductor Devices from Heavy Ion Irradiation", December 1996. - 8. S. Buchner, P. Marshall, S. Kniffin, K. LaBel, "Proton Test Guideline Development Lessons Learned", NASA/Goddard Space Flight Center, NEPP, August 2002. - 793 9. European Space Agency, "Single Event Effects Test Method and Guidelines", October 1995. - 794 10. R. Velazco, S. Rezgui, R. Ecoffet, "Predicting Error Rate for Microprocessor-Based Digital 795 Architectures Through C.E.U. (Code Emulating Upsets) Injection", IEEE Transactions on 796 Nuclear Science, Vol. 47, No. 6, pp. 2405–2411, December 2000. - 797 11. R. Velazco, B. Martinet, G. Auvert, "Laser Injection of Spot Effects on Integrated Circuits", 798 1st Asian Test Symposium, pp. 158–163, November 1992. # Author's Proof ### 6 Hardware Fault Injection - 12. P. Fouillat, V. Pouget, D. Lewis, S. Buchner, D. McMorrow, "Investigation of Single-Event 799 Transients in Fast Integrated Circuits with a Pulsed Laser", International Journal of High 800 Speed Electronics and Systems, Vol. 14, No. 2, pp. 327–339, 2004. - 13. F. Miller, N. Buard, T. Carrière, R. Dufayel, R. Gaillard, P. Poirot, J. M. Palau, B. Sagnes, 802 P. Fouillat, "Effects of Beam Spot Size on the Correlation Between Laser and Heavy Ion 803 - SEU Testing", IEEE Transactions on Nuclear Science, Vol. 15, No. 6, pp. 3708-3715, 804 December 2004. 805 - 14. D. Powell, J. Arlat, Y. Crouzet, "Estimators for Fault Tolerance Coverage Evaluation", IEEE 806 Transactions on Computers, Vol. 44, No. 2, pp. 261–274, February 1995. 807 - 15. J. Arlat, A. Costes, Y. Crouzet, J. C. Laprie, D. Powell, "Fault Injection and Dependability 808 Evaluation of Fault-Tolerant Systems", IEEE Transactions on Computers, Vol. 42, No. 8, 809 pp. 913–923, August 1993. 810 - 16. H. Maderia et al. "RIFLE: a general purpose pin-level fault injector", Proceedings of the First 811 European Dependable Computing Conference, Berlin, Germany, October 1994, pp. 199–216. 812 - 17. P. Folkesson, S. Svensson, J. Karlsson, "A comparison of simulation based and scan chain 813 implemented fault injection (SCIFI)", Proceedings of FTCS-28, IEEE Computer Society 814 Press, Munich, June 1998, pp. 284-293. 815 - 18. O. Gunnetlo, J. Karlsson, J. Tonn, "Evaluation of error detection schemes using fault injection 816 by heavy-ion radiation", Proceedings of the 19th Ann. Int'l Symp. Fault-Tolerant Computing, 817 IEEE CS Press, Los Alamitos, CA, 1989, pp. 340–347. 818 - 19. J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J. C. Fabre, J. C. Laprie, E. Martins, D. Powell, 819 "Fault Injection for Dependability Validation: A Methodology and some Applications", IEEE 820 Transactions on Software Engineering, Vol. 16, No. 2, pp. 166–182, February 1990. 821 - 20. M. C. Hsueh, T. K. Tsai, R. K. Iyer, "Fault Injection Techniques and Tools", IEEE Computer, 822 Vol. 30, No. 4, pp. 75–82, April 1997. - 21. G. Kanawati, N. A. Kanawati, J. A. Abraham, "FERRARI: A Flexible Software-Based Fault 824 and Error Injection System", IEEE Transactions on Computers, Vol. 44, No. 2, pp. 248–260, 825 February 1995. 826 - 22. J. Carreira, H. Madeira, J. G. Silva, "Xception: A Technique for the Experimental Evaluation 827 of Dependability in Modern Computers", IEEE Transactions on Software Engineering, 828 Vol. 24, No. 2, pp. 125–136, February 1998. 829 - 23. T. Jarboui, J. Arlat, Y. Crouzet, K. Kanoun, T. Marteau, "Analysis of the effects of real and 830 injected software faults: Linux as a case study", IEEE Proceedings of 2002 Pacific Rim 831 international Symposium on Dependable Computing (PRDC'02), 2002. 832 - 24. M. Rebaudengo, M. Sonza Reorda, "Evaluating the Fault Tolerance Capabilities of Embedded 833 Systems via BDM", 17th IEEE VLSI Test Symposium, pp. 452-457, Dana Point, USA, April, 1999. 835 - 25. IEEE-ISTO 5001-2003, "The Nexus Forum™ standard for a global embedded processor 836 debug interface", version 2.0, 2003. 837 - 26. A. V. Fidalgo, G. R. Alves, J. M. Ferreira, "Real Time Fault Injection Using Enhanced OCD 838 A Performance Analysis", 21st IEEE International Symposium on Defect and Fault-Tolerance 839 in VLSI Systems (DFT), 2006. 840 - 27. J. Peng, J. Ma, B. Hong, C. Yuan, "Validation of Fault Tolerance Mechanisms of an Onboard 841 System", 1st International Symposium on Systems and Control in Aerospace and Astronautics 842 (ISSCAA), pp. 1230–1234, January 2006. 843 - 28. M. Portela-García, M. García-Valderas, C. López-Ongil, L. Entrena, "An Efficient Solution to 844 Evaluate SEU Sensitivity in Digital Circuits with Embedded RAMs", XXI Conference on 845 - Design of Circuits and Integrated Systems (DCIS'06), November 2006. 29. P. Kenterlis, N. Kranitis, A. Paschalis, D. Gizopoulus, M. Psarakis, "A Low-Cost SEU Fault 847 Emulation Platform for SRAM-Based FPGAs", 12th IEEE International On-Line Testing 848 Symposium, pp. 235–241, July 2006. 849 - 30. M. Alderighi, F. Casini, S. D'Angelo, M. Mancini, A. Marmo, S. Pastore, G. R. Sechi, 850 "A Tool for Injecting SEU-like Faults into the Configuration Control Mechanism of Xilinx 851 # L. Entrena et al. - Virtex FPGAs", 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2003. - 854 31. M. Alderighi, F. Casini, S. D'Angelo, M. Mancini, S. Pastore, G. R. Sechi, R. Weigand, - "Evaluation of Single Event Upset Mitigation Schemes for SRAM Based FPGAs Using the FLIPPER Fault Injection Platform", 22nd IEEE International Symposium on Defect and Fault - The Feet Fault injection Flatforni , 22nd iEEE international Symposium on Defect and Faul Tolerance in VLSI Systems, pp. 105–113, 2007. - 858 32. K. T. Cheng, S. Y. Huang, W. J. Dai, "Fault emulation: a new approach to fault grading", 859 Proceedings of the International Conference on Computer-Aided Design, pp. 681–686, 1995. - 33. L. Antoni, R. Leveugle, B. Feher, "Using Run-Time Reconfiguration for Fault Injection in HW Prototypes", IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems, - pp. 245–253, 2002. - 863 34. M. Aguirre, J. N. Tombs, F. Muñoz, V. Baena, A. Torralba, A. Fernandez-Leon, F. Tortosa, - D. Gonzalez-Gutierrez, "An FPGA based hardware emulator for the insertion and analysis of Single Event Upsets in VLSI Designs", Radiation Effects on Components and Systems - Workshop, September 2004. - 867 35. J. H. Hong, S. A. Hwang, C. W. Wu, "An FPGA-Based Hardware Emulator for Fast Fault 868 Emulation", MidWest Symposium on Circuits and Systems, 1996. - 36. P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, M. Violante, "Exploiting Circuit Emulation for Fast Hardness Evaluation", IEEE Transactions on Nuclear Science, Vol. 48, No. 6, 2001. - 872 37. C. López-Ongil, M. García-Valderas, M. Portela-García, L. Entrena, "Autonomous 873 Fault Emulation: A New FPGA-based Acceleration System for Hardness Evaluation", IEEE - Transactions on Nuclear Science, Vol. 54, Issue 1, Part 2, pp. 252–261, February 2007. 38. M. Violante, "Accurate Single-Event-Transient Analysis via Zero-Delay Logic Simulation", IEEE Transactions on Nuclear Science, Vol. 50, No. 6, December 2003. - M. García Valderas, R. Fernández Cardenal, C. López Ongil, M. Portela García, L. Entrena. "SET emulation under a quantized delay model", Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFTS), pp. 68–78, September 2007. - 40. F. Lima, S. Rezgui, L. Carro, R. Velazco, R. Reis, "On the use of VHDL simulation and emulation to derive error rates", Proceedings of 6th Conference on Radiation and Its Effects on Components and Systems (RADECS'01), Grenoble, September 2001. - F. Faure, P. Peronnard, R. Velazco, R. Ecoffet, "THESIC+, a flexible system for SEE testing", Proceedings of RADECS Workshop, [September 19–20, 2002, Padova], pp. 231–234. - 886 42. D. Lewis, V. Pouget, F. Beaudoin, P. Perdu, H. Lapuyade, P. Fouillat, A. Touboul, "Backside 887 Laser Testing of ICs for SET Sensitivity Evaluation", IEEE Transactions on Nuclear Science, 888 Vol. 48, Issue 6, Part 1, pp. 2193–2201, December 2001. - 43. F. Faure, R. Velazco, P. Peronnard, "Single-Event-Upset-Like Fault Injection: A Comprehensive Framework", IEEE Transactions on Nuclear Science, Vol. 52, Issue 6, Part 1, pp. 2205–2209, December 2005. - 44. P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, M. Violante, "FPGA-Based Fault Injection for Microprocessor Systems", IEEE Asian Test Symposium, pp. 304–309, 2001. - 45. M. Nicolaidis, "Emulation/Simulation d'un circuit logique", French patent, filed February 25 2005, issued October 12 2007. - 46. M. G. Valderas, P. Peronnard, C. Lopez-Ongil, R. Ecoffet, F. Bezerra, R. Velazco, "Two Complementary Approaches for Studying the Effects of SEUs on Digital Processors", IEEE Transactions on Nuclear Science, Vol. 54, Issue 4, Part 2, pp. 924–928, August 2007. # **Author Queries** Chapter No.: 6 | Query Refs. | Details Required | Author's response | | |-------------|-----------------------------------------------------------------------------------------|--------------------|------| | AU1 | Please provide affiliation for all the authors. | Affiliation is wri | tten | | AU2 | The sentence has been edited for better readability. Please check and approve the edit. | OK | | | AU3 | "[NEXUS]" has been changed to ref. "[25]". Please check if this is correct. | OK | | | AU4 | The sentence has been edited for better readability. Please check and approve the edit. | OK | | | AU5 | The sentence has been edited for better readability. Please check and approve. | OK | | | AU6 | The sentence has been edited for better readability. Please check and approve the edit. | OK | |