# Fault-Tolerance of Self-Timed Circuits

Yuri A. Stepchenkov<sup>1</sup>, Anton N. Kamenskih<sup>2</sup>, Yuri G. Diachenko<sup>1</sup>, Yuri V. Rogdestvenski<sup>1</sup>, and Denis Y. Diachenko<sup>1</sup>

<sup>1</sup>Department of architectural an schematic basis of the innovative computational systems, Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences (IPI FRC CSC RAS), Moscow, Russian Federation

<sup>2</sup>Department of Automation and remote control, Perm National Research Polytechnic University, Perm,

Russian Federation

YStepchenkov@ipiran.ru

Abstract—the paper discusses a fault-tolerance problem for digital integrated circuits. Due to their properties, self-timed circuits, unlike synchronous counterparts, are immune towards the greater part of the short-term logical faults. Indication of an illegal state of the dual-rail signal as second spacer increases fault-tolerance of the combinational selftimed circuits up to 82%. Self-timed triggers, due to their indication features, are immune to 44% logical faults. The use of special methods of doubling transistors and bistable cells, which are the basis of the self-timed triggers, enhances their fault-tolerance up to 80%.

*Keywords—fault-tolerance; self-timed circuits; spacer; indication; diphase.* 

### I. INTRODUCTION

Due to a number of destabilizing reasons (noise on the supply buses and signal wires, radiation, heavy charged particles (HCP), protons, neutrons, etc.) CMOS integrated circuits are prone to logical faults. The most frequently appeared logical faults are the following effects [1]:

- Short-term single transient event (SET), resulting in "false" pulses in the logical nets;
- Single event upset (SEU).

The various architectural and circuit techniques are used for fault (SET and SEU) prevention.

At the architectural level of the computational synchronous circuits, a fault-tolerance is provided by a redundancy of the functional units or the entire system [2], by using noise resistant (redundant) codes [3, 4] and some other methods.

The schematic methods include designing cell circuits having enhanced resistance to the single event effects (SEE) at a transistor level [5]. For example, diphase logical cells [6], dual interlocked storage cells (DICE) [7], doubling all transistors in the circuit diagram [8], using additional gates for SET prevention [9], and some others [10]. All of these solutions are based on doubled or redundant implementation of the circuit logic functions. But none of them provides full protection against single failures. We can only speak about the degree of the fault-tolerance.

Self-timed (ST) circuits [11, 12, 13, 14] are naturally redundant and use the redundant encoding as fault-tolerant synchronous circuits do. So they are immune to the some types of faults to a considerable degree. But in addition, they indicate the outputs of all their cells, thus providing higher fault-tolerance compared to the synchronous analogs.

Due to their diphase discipline and indication, ST circuits already solve a number of the fault-tolerance problems. We had made sure of this designing our ST projects, for example [11, 14]. The increased complexity (two times as much in comparison to non-fault-resisted synchronous circuits) is a penalty for this. An objective of the paper is to prove the natural soft fault resistance of the ST circuits and to offer the additional techniques that allow for increasing soft faulttolerance of the ST circuits.

#### II. FEATURES OF THE ST CIRCUITS

ST circuits are ideologically resisted to the most part of the short-term single failures due to their fundamental properties:

- Redundant dual-rail signal coding;
- Diphase work discipline;
- Indication of the end of all switches in the circuit.

Dual-rail encoding consists in converting each data signal into dual-rail signal (DRS), which has two work states ("01" and "10"), and one spacer state (zero "00" or unit "11"). An "anti-spacer" (AS), which is opposed to spacer state, is considered as prohibited in the traditional ST circuits. An indication of the classic ST circuits perceives AS as work state and promotes its distribution through the circuit. AS should never appear during normal operation of the ST circuits.

Diphase work discipline does not allow the subsequent ST circuits for processing data generated by a source ST unit until all its information outputs has switched to a new work state after spacer phase. Therefore, if the duration of short-term transition of any cell output into a state that does not match the inputs of this cell is less than the time of generating new work state at the outputs of the entire circuit, this fault will be "masked" by ST circuit work discipline.

The study was done by a grant from the Russian Science Foundation (Project №. 19-11-00334)

Experiments and calculations [15, 16] show that a single logical failure in a combinational circuit can last from picoseconds to a few nanoseconds, and depends on several factors:

- Number of the electron-hole pairs (EHP) generated in semiconductor bulk by an ionization process caused by a fault reason (HCP, proton, neutron, etc.);
- Supply voltage;
- Potentials in the areas adjacent to the ionization track;
- Physical properties of the doped semiconductor regions.

In the sub-micron technologies (65 nm and below) the duration of a logical failure repeatedly overwrites the switch delays of the circuit cells. It is comparable with a clock period in the synchronous circuits and with an operation cycle length (work phase plus spacer phase) in the ST circuits. The risk of a logical failure in the ST circuits also depends on a time of an appearance of the fault inside the ST circuit operation cycle.

## III. CRITICAL FAULTS IN THE COMBINATIONAL ST CIRCUITS

The following faults are potentially critical for the operation of the ST circuits:

- 1) Fault in the work phase, which led to the switching of any DRS into AS;
- Fault in the spacer phase before an indication output confirms the end of circuit transition into this spacer phase, which led to switching any DRS into work state that is inverse in relation to its state in the previous work phase;
- Fault in spacer phase after indication output confirms the end of the circuit transition into spacer, which led to switching some DRS into any work state.

Case 2 is equivalent to a premature switching DRS into work phase during transiting entire ST circuit to spacer state. This causes one or more of the following consequences:

- a) Indication output of the ST circuit does not switch to the spacer, and then waits for the fault termination followed by switching failed DRS into spacer. This means a delay of the switching this ST circuit to spacer, and the possible incorrect work state generation at the circuit outputs;
- b) Indication output of the ST circuit switches to the spacer, but as a result of the failure some of the external outputs schemes may switch after that into unexpected work state.

Case 3 is similar to the case 2(b).

A fault in the work phase, which has caused switching DRS into work state that is opposite to an expected one, is also critical. It does not break the phase sequence in ST circuit, but generates an incorrect work state of the information outputs. Such fault is not detected by an indication subcircuit, because the DRS state turns out to correspond the current phase of the ST circuit. However, the probability of such failure is extremely small, as this requires that both cells forming the DRS have to switch into the state, which is opposite to an expected one in the work phase.

The physical reason of the logical faults consists in the EHP generation in the semiconductor bulk when HCP, proton, or neutron with sufficiently high energy interacts with silicon. In a high electric field, electrons and holes are separated. They move in different directions, creating the current pulse. Ionization current can change the output parasitic capacitance charge of the failed cell to a level sufficient to switch another cell connected to the failed cell.

The effective diameter of HCP track does not exceed one micrometer [15]. In 65-nm CMOS process, this area covers a layout region containing the drains and sources of the transistors of the same type or of a different types belonging to two or more standard cells. Fig. 1 shows 65-nm layout fragment including four NOR2 cells. The layout is symmetric. N-CMOS transistors occupy the central part; p-CMOS transistors occupy left and right sides. Dotted circles A1-A4 show possible locations of the effective diameter of HCP track. It is obvious that one HCP cannot selectively affect, for example, only n-type transistors in one cell and only p-type transistors in another cell at the same time. It always affects either transistor drains of the same type in the adjacent cells (A1, A3, A4), or the various type drains of the transistors, but also in a few neighboring cells simultaneously (A2). Therefore, the impact of HCP on the structure of neighboring cells proves to be symmetric. In a case of more complex standard cells, HCP may affect only one cell.



Fig. 1.Effective diameters of HCP track in 65-nm layout

Thus, one HCP can cause an ionization current in several adjacent cells, but this current will have the same direction in all these cells. So the voltage changes at the output of the adjacent failed cells will have the same polarity. Reduction of the design rules decreases layout area of the cells proportionally, and the effective diameter of the HCP track will also cover both types transistor drains in a few adjacent cells.

When the cells driving DRS have sufficiently close layout locations, HCP impact will be symmetric and will not cause the generation of the opposite logic levels at the outputs of these cells. Consequently, the faults in the work phase leading to switching DRS into work state that is opposite to the expected one do not appear practically in the chips fabricated by CMOS processes with design rules of 65 nm and below.

Masking AS improves the fault-tolerance of ST circuits. It is based on a DRS fault-tolerant discipline where AS is also considered as a spacer state. Note, AS state is not valid as usual in contrast with [17]. We only expect it as a possible result of a logical fault, and are ready to indicate it as second spacer to prevent a malfunction. To indicate two spacers ("00" and "11"), an "equivalence" or "exclusive OR" cells are used. A problem of the logic function monotony for

these cells, which is an obligatory in the ST circuits, is solved by using pass transistors in their implementation as Fig. 2 shows.



Fig. 2. CMOS implementation of monotone "equivalence" function

So, the usage and indication of the double spacer allows for preventing all logical faults that cause the appearance of AS in the combinational ST circuits thus increasing their soft fault-tolerance. A penalty for such increase is unessential: six CMOS transistors instead of four ones in the indication cell that forms first level of an indication subcircuit.

#### IV. CRITICAL FAULTS IN THE ST TRIGGERS

ST triggers are based on the bistable cells (BSC) consisting of a logical cell pair with the crisscross connections. BSC outputs form the bi-phase signal that does not have a spacer. However, it has one transit (dynamic, short-term) state, through which the BSC switches from one static work state to another work state.

When generating the correct sequence of the BSC inputs, its outputs at any given time are in one of three states: work ("10" or "01") or transit ("00" for BSC on NOR cells or "11" for BSC on NAND cells). A state that is opposite to transit state ("anti-transit", AT) never occurs at the absence of the faults.

BSC in the work phase switches its outputs in accordance with its inputs, and in the spacer phase stores its state. Fault may lead to irreversible consequences, if it would cause switching BSC into an opposite work state. BSC has cross-feedback, due to which the fault in one cell of the BSC may lead to failure switching second cell of the BSC if there are no active levels at the information and control inputs of the BSC.

A peculiarity of the ST triggers comparing to combinational circuits is reflected in their indication. It analyzes not only the BSC outputs, but also its inputs. It is based on checking direct correspondence between its inputs and outputs in the work phase. Therefore, not all faults are critical.

Only the following faults are critical:

- Switching output BSC in the flip-flops to the opposite state before transition of its indication output into a work value;
- 2) Switching indication output to a value that doesn't match the current phase of the trigger;
- Appearance of the AT state at the BSC outputs in its work phase before switching trigger's indication output to corresponding value;

- Appearance of the AT state at the BSC outputs in its work phase after switching trigger's indication output to corresponding value;
- 5) Appearance of the AT state at the BSC outputs in its spacer phase before switching trigger's indication output to corresponding value;
- Appearance of the AT state at the BSC outputs in its spacer phase after switching trigger's indication output to corresponding value.

In case 1, the trigger's information outputs are no longer supported by the outputs of first trigger's BSC and not controlled by its indicator. Therefore, incorrect trigger state will be considered as correct one by the followers.

In case 2, a fault will disallow followers for using trigger's information outputs during work phase in the latches and during spacer phase in the flip-flops. In opposite phases of these triggers it will allow for using trigger's information outputs before their update.

In case 3, a fault will initiate unexpected switching trigger's indicator to that value, as if the BSC really has switched over its work state. If then the active levels of inputs the BSC will be removed (when switching BSC to spacer phase), BSC may turn into a multivibrator.

In case 4, a long enough failure can lead to BSC transformation to the multivibrator when it switches to spacer phase. A fault in the cases 5 and 6 may results in the same effect.

Under certain conditions, the following faults can also become critical:

- Switching input BSC to an opposite state at its spacer phase before appearing spacer value at the trigger's indication output;
- Switching input BSC to an opposite state at its spacer phase after appearing spacer value at the trigger's indication output;
- 9) Premature indication output switching to a value corresponding to the current phase of the trigger;
- 10) Appearance of the AT state at the BSC outputs in its work phase after switching trigger's indication output to the value corresponding to the current phase of the trigger.

A failure occurred in spacer phase of an input BSC (cases 7 and 8) will prove critical in the latches, regardless of the trigger's indication output value, if their followers are allowed for using trigger outputs in its spacer phase. The same fault in ST flip-flops will always be critical because it won't be fixed in the input BSC, as trigger's indication output does not compare its inputs and outputs in this phase. This will lead to writing erroneous information to second BSC driving trigger's information output.

Case 9 is critical only when the fault has led to premature switching indication output either to its work value in the latches or to its spacer in the flip-flops when trigger's information outputs are updating their state. Thus trigger falsely declares the validity of its outputs and allows followers for using it. In practical circuits, a transition delay of the followers can mask this fault because it does not prevent switching information outputs of the trigger to the correct state, though with some delay in relation to the indication output. Case 10 under certain conditions is also critical because it initiates premature switching trigger's indicator to the value corresponding to the next phase of the trigger operation.

To resume, if a fault will occur in the BSC work phase, when a state of the BSC outputs is actively supported by the BSC inputs, and it will not cause a trigger transition to the next phase or prevent such transition, then BSC inputs will recover the correct state of the BSC after disappearing this fault reason. However, if the failure will happen in the spacer phase of the BSC when it stores its state, the disappearance of fault reason won't cause recovering right state of the BSC.

Analysis shows that ST triggers mask 44% of the faults due to their features. The fault-tolerance level of the ST triggers can be improved up to combinational circuit faulttolerance level due to both the circuitry and layout design methods (for example, DICE [7, 18]) and the usage of the ST triggers with the DRS information outputs.

DICE technique increases fault-resistance of the synchronous memory cells and triggers due to doubled hardware and special circuit and layout design. It is also available in ST triggers providing their additional fault-resistance. This technique as well as some other special methods of doubling transistors and bistable cells, which are the basis of the self-timed triggers, enhances their fault-tolerance up to 80%.

#### V. CONCLUSION AND FUTURE WORK

Peculiarities of organization and functioning of the ST circuits allow them for resisting many of the reasons of a single failure occurrence and for masking short-term logical faults even without using special circuit techniques: up to 82% in combinational ST circuits and up to 44% in ST trigger circuits. This feature is provided by the increased complexity (two times as much in comparison to non-fault-resisted synchronous circuits). But such doubling complexity is not dramatic as the fault-resisted synchronous circuits with similar features also require more hardware resources.

The usage of the DICE-like approach for implementing ST triggers at circuitry and layout level will provide an additional increase of the fault-tolerance of the ST triggers up to 80%.

Due to "immunity" of ST circuits to the most soft fault reasons and their workability in a wide range of operating conditions, the usage of ST circuits improves fault-tolerance of the digital hardware.

Additional increase of the fault-tolerance is achieved by using ternary logic [19]. This allows for reducing cross-noise level due to replacing each DRS by corresponding bipolar signal. Such approach decreases total number of the nets, but requires more complex circuitry for logic and indication. Our following study will be devoted to this problem.

#### ACKNOWLEDGMENT

The study was done by a grant from the Russian Science Foundation (Project  $N_{2}$ . 19-11-00334).

#### REFERENCES

- A.I. Chumakov, "Forecasting local radiation effects in IC under the influence of the outer spacefactors," Microelectronics, 2010, Vol.39, No. 2, pp. 85-90 (in Russian).
- [2] Space product assurance. Techniques for radiation effects mitigation in ASICs and FPGAs handbook, European Cooperation for Space Standartization, ECSS-Q-HB-60-02A, 1 September 2016, 236 p.

- [3] R. Bleyhut, Error-Detection Codes Theory and Practice. Moscow:Book on request, 2013, 566 p. (in Russian).
- [4] S.V. Gavrilov, S.I. Gurov, T.D. Zhukova, S.V. Rukhlov, D.I. Ryzhova, D.V. Telpuhov, "Methods of improving fault-tolerance of combinational IC by redundant coding methods," Applied mathematics and informatics. Proc. Faculty of computational mathematics and cybernetics, Lomonosov Moscow State University, No. 53, 2016.- P. 93-102 (in Russian).
- [5] F.-X. Yu, J.-R. Liu, Z.-L. Huang, H. Luo and Z.-M. Lu, "Overview of Radiation Hardening Techniques for IC Design," Information Technology Journal, June 2010, vol. 9, pp. 1068-1080.
- [6] S.I. Olchev and V.Y. Stenin, "Diphase CMOS logic gates with advanced fault-tolerance in relation to the effects of individual nuclear particles," Microelectronics, Vol. 40, No. 3, 2011, pp. 170-183 (in Russian).
- [7] M. Berg, "Revisiting Dual Interlocked Storage Cell (DICE) Single Event Upset (SEU) Sensitivity," in Microelectronics Reliability & Qualification Working Meeting (MRQW) 2013 and HiREV Industry Day, El Segundo, CA, December 10-12, 2013 [Online]. Available: https://nepp.nasa.gov/files/25702/2013MRQW\_Berg\_n272.pdf. [Accessed: 25.02.2019].
- [8] S.F. Tyurin, A.N. Kamenskih, "About reserving logical functions at the transistor level," In a world of scientific discoveries, 2014, No. 10, pp. 232-247 (in Russian).
- [9] A.Balasubramanian, B.L. Bhuva, J.D. Black and L.W. Massengill, "RHBD Techniques for Mitigating Effects of Single-Event Hits Using Guard-Gates," IEEE TNS, 2005, vol. 52, no. 6, pp. 2531-2535.
- [10] R. Trivedi, U.S. Mehta, "A Survey of Radiation Hardening by Design (RHBD) Techniques for Electronic Systems for Space Application," International Journal of Electronics and Communication Engineering & Technology (IJECET), Jan-Feb 2016, vol. 7, Issue 1, pp. 75-86.
- [11] Y. Stepchenkov, Y. Diachenko, V. Zakharov, Y. Rogdestvenski, N. Morozov, D. Stepchenkov, "Quasi-Delay-Insensitive Computing Device: Methodological Aspects and Practical Implementation," in PATMOS'2009: Proceedings of the International Workshop on power and timing modeling, optimization and simulation. – Delft, The Netherlands, Springer 2010, pp. 276–285.
- [12] Y. Stepchenkov, V. Zakharov, Y. Rogdestvenski, Y. Diachenko, N. Morozov, and D. Stepchenkov. "Speed-Independent Fused Multiply Add and Subtract Unit," in Proceedings of IEEE EastWest Design and Test Symposium (EWDTS'2016), Yerevan, October, 14 - 17, 2016, pp. 150-153.
- [13] Y.A. Stepchenkov, V.N. Zakharov, Y.V. Rogdestvenski, Y.G. Diachenko, N.V. Morozov, and D.Y. Stepchenkov, "Speed-Independent Floating Point Coprocessor," in Proceedings of IEEE Eeast-West Design and Test Symposium, Batumi, Georgia, September 26-29, 2015, pp. 111-114.
- [14] Y. Stepchenkov, Y. Rogdestvenski, Y. Diachenko, D. Stepchenkov, Y. Shikunov, "Energy Efficient Speed-Independent 64-bit Fused Multiply-Add Unit," in Proceedings of 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus2019), Jan. 2019, pp. 1709-1714.
- [15] V.E. Shunkov, "Physics of radiation effects affecting the electronics in space" [Online]. Available: https://habr.com/post/189066 [Accessed: 24.02.2019] (in Russian).
- [16] V.E. Shunkov, "Optimization of IC elements for protecting from single faults," Software products and systems. 2011, No. 4, pp. 24-26 (in Russian).
- [17] M. Linder, J. Di and S. C. Smith, "MTD3L A Secure IC Design Methodology with Reduced Overhead," in Proceedings of the 14th international conference on Automatic Control, Modelling & Simulation, and Proceedings of the 11th international conference on Microelectronics, Nanoelectronics, Optoelectronics. World Scientific and Engineering Academy and Society (WSEAS), 2012, pp. 191-196.
- [18] Y.V. Katunin, V.Y. Stenin and P.V. Stepanov, "Simulation of trigger diphase CMOS logic cell characteristics, taking into account the separation of charge at the effects of individual nuclear particles," Microelectronics, 2014, Vol. 43, No. 2, pp. 104-117. (in Russian)
- [19] D. W. Jones, "Standard Ternary Logic" [Online]. Available: http://homepage.cs.uiowa.edu/~jones/ternary/logic.shtml [Accessed: 20.04.2019].