# Advanced Indication of the Self-Timed Circuits\*

Yury Stepchenkov
Institute of Informatics Problems
Federal Research Center "Computer
Science and Control" of the Russian
Academy of Sciences
Moscow, Russia
YStepchenkov@ipiran.ru

Yury Shikunov
Institute of Informatics Problems
Federal Research Center "Computer
Science and Control" of the Russian
Academy of Sciences
Moscow, Russia
yishikunov@gmail.com

Yury Diachenko
Institute of Informatics Problems
Federal Research Center "Computer
Science and Control" of the Russian
Academy of Sciences
Moscow, Russia
diaura@mail.ru

Denis Diachenko
Institute of Informatics Problems
Federal Research Center "Computer
Science and Control" of the Russian
Academy of Sciences
Moscow, Russia
diaden87@gmail.com

Yury Rogdestvenski
Institute of Informatics Problems
Federal Research Center "Computer
Science and Control" of the Russian
Academy of Sciences
Moscow, Russia
YRogdest@ipiran.ru

Abstract—Paper discusses a problem of the CMOS self-timed circuits' indication. Large number of indicating signals in the multi-bit computational devices and registers requires an additional hardware and time for their combining and forming a single control signal that provides a request-acknowledge interaction between interconnected self-timed functional blocks. Indication subcircuit performs this. Multi-input hysteretic triggers allows for accelerating indication subcircuit by factor of 1.1 – 1.6 and reducing its complexity in several times in comparison to standard implementation basis on static and semi-static Muller's elements. A penalty for this is some short-circuit current in the worst case.

Keywords—self-timed, indication, C-element, hysteretic trigger, performance, complexity

## I. INTRODUCTION

Theoretically self-timed (ST) circuits provide best performance in any particular ambient conditions because they are free of any given external clock. They use a request-acknowledge interaction between source of a processed digital data and its receiver. Due to this performance of the ST circuits is determined only by the real cell delays in the current operating conditions.

Unlike ST circuits, synchronous circuits operate under strict control of an external clock. Therefore, if a clock source does not adapt to changing ambient conditions, synchronous circuits are forced to focus on the "worst case": minimum supply voltage, maximum permissible ambient temperature, "slow" transistors etc. As a result, in some applications, the ST circuits are faster than their synchronous counterparts despite their hardware redundancy.

The main factors limiting performance of the ST circuits are as follows:

- Diphase work discipline.
- Presence of an indication subcircuit.

There are two phases in any ST circuit operation: work phase implementing an input data processing algorithm and spacer, in which ST circuit prepares for next work phase. Spacer is necessary to separate adjacent work phases, but it adds nonproductive delay to a total work cycle of ST circuit.

Indication subcircuit is an integral part of the ST circuits. It provides completion detection at each phase and controls an interaction between ST functional blocks. Indication subcircuit combines all internal indication signals into one phase signal. It is a control signal for ST circuits that are the drivers and receivers regarding this ST circuit. ST circuit is considered to be switched to the next phase only when both algorithmic part of the circuit, and indication subcircuit have switched to this phase. At that, all circuit elements must complete their switch in this phase. Therefore, to confirm the end of switching ST circuit, one needs to indicate outputs of all circuit components and combine them into a single indication output.

The higher performance of the ST circuits compared to their synchronous counterparts is showed obviously in relatively simple circuits with a small capacity. Here indication subcircuit works in the "background" mode and its contribution to the circuit delay is negligible. In multi-bit ST circuits a large number of internal indication signals leads to "swelling" indication subcircuit and to increasing its contribution to the digital data processing tract delay.

Therefore, the development of the components accelerating completion detection of the ST circuits is an urgent task. This paper analyses the indication subcircuit implementation variants for CMOS ST circuits with a diphase operation discipline and researches the ways of their accelerating and simplifying. Coding discipline of the information signals is dual-rail in the combinational part and bi-phase (output of RS-trigger, [1]) in sequential part of the ST circuits.

The scientific novelty of the paper consists in researching feasible alternates of the multi-input hysteretic trigger that

 $<sup>^{\</sup>ast}$  The study was done by a grant from the Russian Science Foundation (Project No. 19-11-00334)

speeds up and simplifies an indication subcircuit for any ST circuit, especially for multi-bit arithmetic digital units.

### II. INDICATION SUBCIRCUIT IMPLEMENTATION BASIS

The classic principle of indicating circuits with dual-rail and bi-phase encoding of the information signals is as follows [2]:

- Generating a signal indicating spacer or work state of each data signal.
- Combining all internal indication signals into one output indication signal.

Signal indicating dual-rail information signal is generated using k-OR (k-NOR) cells for zero spacer or m-AND (m-NAND) cells for unit spacer, where k is a number of seriesconnected p-MOS transistors between supply bus and cell output, which is admissible in this process; m is a number of series-connected n-MOS transistors between ground bus and cell output, which is admissible in this process. Internal indication signals are combined by a pyramid circuit using the special indication cells.

Traditional indication cells are as follows: C-element (semi-static Muller's element) and hysteretic trigger (H-trigger, static Muller's element) [3]. Fig. 1 and 2 show 3-input CMOS circuits of the C-element and H-trigger respectively.

C-element uses the "weak" inverter (its transistors are outlined with dotted ovals in Fig. 1) to store its state between time moments, when all inputs of the C-element are the same. When C-element is forced to switch to an opposite state, a chain of sequentially connected "strong" p- or n-type transistors "draws" the potential of the internal net A supported by an opposite type transistor in "weak" inverter.

Advantages of the C-element are as follows:

- Small number of transistors: 2-(N + 2), where N is the number of inputs.
- Unit capacitance on each input.



Fig. 1. 3-Input C-element



Fig. 2. 3-Input H-trigger

Disadvantages of the C-element are as follows:

- Availability of a short-circuit current (a few picoseconds in 65-nm CMOS process) flowing through the chain of series-connected "strong" transistors and transistor from "weak" inverter during switching Celement.
- Low noise immunity due to the fact that internal net A
  potential is supported by "weak" transistor; under the
  influence of strong enough interference it may change
  to a level switching "strong" output inverter and
  inverting state of the C-element.

In H-trigger, transistors providing holding trigger's state may have an arbitrary size because such an active transistor is disconnected from the power source or ground at a time when all inputs of the H-trigger take the same value not matching the stored state. Therefore, H-trigger switches in an opposite state without any process of "drawing" potential of the internal node A to new value.

Advantages of the H-trigger are an opposite to disadvantages of C-element:

- Lack of short-circuit current during switching.
- High noise immunity because the stored trigger's state is supported by "strong" transistors.

In addition, it has the better performance due to the lack of "drawing" the internal node A potential.

Disadvantages of the H-trigger are as follows:

- Increased number of transistors: 4 (N + 1), where N is the number of inputs.
- Input capacity is larger than in C-element.

Due to limitations on the number of series-connected transistors in CMOS circuit (no more than three p-MOS transistors and not more than four n-MOS transistors) indication subcircuit is based on 2-input and 3-input C-elements or H-triggers. Indication subcircuit combining M indication signals into single one can be implemented on

208 2019 IEEE EWDTS

"] $M \cdot (1 - 1/\log M)$ [" 2-input C-elements and H-triggers or on "] $\frac{M}{2} \cdot (1 - 1/\log_3 M)$ [" 3-input their versions. It will have "] $\log M$ [" or "] $\log_3 M$ [" layers (cascades) of such cells respectively.

For example, the number of indication signals at first Wallace "tree" layer of a double precision multiplier compliant to IEEE754 standard [4] using dual-rail with unit spacer encoding equals to 1431. The indication subcircuit combining them will have 716 3-input and 2-input H-triggers located on the 7 layers of a pyramidal structure. One H-trigger has roughly 50-ps delay in 65-nm CMOS process in typical conditions. Thus a total delay of such indication subcircuit will be around 350 ps.

In applications that do not require the maximum reduction of dynamic current consumption, it is permissible to use multi-input H-triggers [3, 5] whose behavior is described by a Boolean function:

$$Y^+ = I_1 * I_2 * ... * I_N + Y * (I_1 + I_2 + ... + I_N),$$

where  $I_1$ ,  $I_2$ ,...,  $I_N$  are the inputs of the N-input H-trigger. Fig. 3 demonstrates CMOS circuit of the N-input H-trigger.

The peculiarities of the multi-input H-triggers are as follows:

- Lack of connected in series transistors controlled by the inputs.
- "Weak" inverter (marked by dashed oval) is controlled by one of the trigger's input (I<sub>N</sub> in Fig. 3) rather than by its output.

Multi-input H-trigger is also semi-static. Premature switching  $I_{\rm N}$  input to a value corresponding to the next phase of work of the H-trigger causes a short-circuit current. And this current lasts until all inputs of the H-trigger will switch to the same value as  $I_{\rm N}$ . Short-circuit current strength depends on the width of both the transistors in the "weak" inverter and opposing it serial-parallel transistor group in the input part of the trigger.



Fig. 3. N-input H-trigger

Circuit in Fig. 3 works correctly and without short-circuit current at any size of the transistors, if I<sub>N</sub> input is changed the

most recent of all H-trigger inputs. To provide this logically, a designer can connect to this input an indication signal formed by longer cell chain than other indication signals. However, the behavior of the ST circuit should not depend on the delay of its elements. Consequently, one must to take into account that  $I_{\rm N}$  may be either delayed in relation to the rest of the trigger inputs, or preceding switch of at least one of them, even if  $I_{\rm N}$  propagates through longer cell chain. So there are specific requirements for the implementation and usage of the multi-input H-triggers.

### III. MULTI-INPUT H-TRIGGER OPTIMIZATION

A necessary condition for the workability of the multi-input H-trigger is that "weak" inverter should not lead to a premature switching trigger at early changes of  $I_N$ , when at least one of the other inputs remained in a opposite state. In addition, the size of the transistors of the multi-input H-trigger should provide acceptable "performance to short-circuit current value" ratio in the worst case.

The necessary workability conditions for the multi-input H-trigger in typical 65-nm CMOS process are achieved with the following transistor size ratios:

$$\begin{cases} \frac{L_{p,weak}}{W_{p,weak}} \geq K_{p,GM} \cdot \left(\frac{L_{n,in}}{W_{n,in}} + \frac{L_{n,FB}}{W_{n,FB}}\right), \\ \frac{L_{n,weak}}{W_{n,weak}} \geq K_{n,GM} \cdot \left(\frac{L_{p,in}}{W_{p,in}} + \frac{L_{p,FB}}{W_{p,FB}}\right), \end{cases}$$
(1)

where  $W_{p,weak}$ ,  $W_{n,weak}$ ,  $L_{p,weak}$ ,  $L_{n,weak}$  are the width and length of p- and n-transistors in the "weak" inverter;  $W_{p,in}$ ,  $W_{n,in}$ ,  $L_{p,in}$ ,  $L_{n,in}$  are the width and length of p- and n-transistors driven by other H-trigger inputs;  $W_{p,FB}$ ,  $W_{n,FB}$ ,  $L_{p,FB}$ ,  $L_{n,FB}$  are the width and length of p- and n-transistors providing storing H-trigger's state at time intervals, when its inputs have the differential values;  $K_{p,GM}$ ,  $K_{n,GM}$  are coefficients depending on process-dependent parameters.

Simulation by means of Spectre program (Virtuoso, Cadence) has allowed for calculating coefficients  $K_{p,GM}$  and  $K_{n,GM}$  values for a standard 65-nm CMOS process. Taking into account the possible combinations of parameters of p- and n-transistors, they have been determined as  $K_{p,GM} = 0.9$  and  $K_{n,GM} = 6.4$ . Size of the transistors in Fig. 3 matches the ratios (1).

Similarly, the transistor sizes in the C-element circuit are calculated to ensure proper operation of the C-element for all combinations of parameters of the p- and n-transistors and permissible ambient conditions. For example, for 3-input C-element

$$\begin{cases} \frac{L_{p,weak}}{W_{p,weak}} \geq K_{p,C} \cdot \left( \frac{L_{n,in1}}{W_{n,in1}} + \frac{L_{n,in2}}{W_{n,in2}} + \frac{L_{n,in3}}{W_{n,in3}} \right), \\ \frac{L_{n,weak}}{W_{n,weak}} \geq K_{n,C} \cdot \left( \frac{L_{p,in1}}{W_{p,in1}} + \frac{L_{p,in2}}{W_{p,in2}} + \frac{L_{p,in3}}{W_{p,in3}} \right), \end{cases}$$
(2)

where  $W_{p,in^*}$ ,  $W_{n,in^*}$ ,  $L_{p,in^*}$ ,  $L_{n,in^*}$  are the width and length of the p- and n-transistors driven by the corresponding C-element input. For standard 65-nm CMOS process, taking into

2019 IEEE EWDTS 209

account the possible combinations of the p- and n-transistor parameters, coefficients  $K_{p,C}$  and  $K_{n,C}$  have values  $K_{p,C} = 0.7$  and  $K_{n,C} = 8.1$ .

Short-circuit current in the multi-input H-trigger depends on the order of switching its inputs. If  $I_N$  input driven the "weak" inverter switches last, the short-circuit current is absent, and vice versa, if it switches first among all inputs of the H-trigger, the short-circuit current is maximum.

Transistor sizes in the N-input H-trigger shown in Fig. 3 provide some balance between short-circuit current at worst condition and trigger's speed. At the same time, they ensure proper operation of the H-trigger at any switch order of its inputs. It is possible to improve performance by increasing the width of transistors in the "weak" inverter. But this will inevitably lead to an increase in the short-circuit current in a worst case.

Fig. 4 shows a family of diagrams presenting the short-circuit current  $I_S$  value in the circuit in Fig. 3 when the  $I_N$  input switches next-to-last, with various process-dependent parameters of transistors ("ff" – all transistors are "fast", "tt" – all transistors are typical, "ss" – all transistors are "slow"). For nominal supply voltage (VDD = 1.0V) the current  $I_S$  does not exceed 120  $\mu$ A throughout the range of ambient temperature at any parameters ratio of the p- and n-transistors.

Short-circuit current in the C-element is comparable to the short-circuit current in the multi-input H-trigger. Its duration at fixed supply voltage and ambient temperature is determined only by the size ratio of "weak" and "strong" transistors and their parameters.

# IV. COMPARISON OF INDICATION SUBCIRCUIT IMPLEMENTATIONS

Due to the nature of CMOS transistors operation at different temperatures and supply voltages, usage of the multi-input H-trigger is not always appropriate. Following are the results of simulating different variants of the indication subcircuit, combining specified number of the indication signals into a single signal.



Fig. 4. Short-circuit current in 16-input H-trigger in the temperature and supply voltage VDD range

To compare performances of the different types of the indication elements, a ring oscillator was used. Fig. 5 shows its circuit. It consists of 10 identical segments (ISC) combining some indication signals and based on a "tree" of 2-input and 3-input indication cells (C-elements (C) or H-triggers (G)), or on a single multi-input H-trigger (GM), and one NAND2 cell enabling generation by signal EN=1. Contribution of the NAND2 to total generation period is insignificant.

Fig. 6 - 8 demonstrate the dependence of the generation period on supply voltage at various temperatures and ratios transistor parameters for three ring oscillators built of indication subcircuits combining 16 indication signals in different basis. Each type of curve corresponds to one corner of the transistor parameters: dotted line – "ss", dashed-dotted line – "tt", solid line – "ff". The results were obtained by means of program Spectre.



Fig. 5. Ring oscillator



Fig. 6. Oscillation period of three ring oscillators for 0.8V supply voltage

Analysis of graphs in Fig. 6 - 8 shows the following:

- Indication subcircuit, combining 16 indication signals into a single output and implemented by one 16-input H-trigger, with supply voltages of 0.8V and 1.0V has the best performance compared with similar circuits on the C-elements and conventional H-triggers,
- At 1.2V supply voltage the advantage of multi-input Htrigger is restricted by the temperature range minus 63°C through plus 50°C for "ss" corner, and is preserved throughout full temperature range in all other corners, decreasing at increased ambient temperature,

210 2019 IEEE EWDTS

 Performance of the indication subcircuit on base of Celement turned out to be worse among others variants under all conditions.



Fig. 7. Oscillation period of three ring oscillators for 1.0V supply voltage



Fig. 8. Oscillation period of three ring oscillators for 1.2V supply voltage

Fig. 9 and 10 present the simulation results for indication subcircuits on base of H-trigger and multi-input H-trigger combining 9 and 27 indication signals at 1.0V supply voltage and in a range of temperatures and parameters of transistors. They show that multi-input H-trigger efficiency falls with decreasing number of combined indication signals. Indication subcircuit combining 9 signals and implemented on base of conventional 3-input H-triggers has better performance in the positive temperature range than 9-input H-trigger.

On the contrary, the 27-input H-trigger shows better performance throughout full temperature range than subcircuit on 3-input H-triggers. Its advantage almost linearly increases from (1...3)% at  $T = 127^{\circ}C$  up to (17...24)% at a  $T = -63^{\circ}C$  depending on transistor parameters.

Multi-input H-trigger shows similar advantage also at lower supply voltages. For example, at 0.8V supply voltage, its performance is better than performance of the subcircuit on base of conventional H-triggers by (13...14)% at  $T = 127^{\circ}C$  and by (36...62)% at  $T = -63^{\circ}C$  depending on transistor parameters. Increasing supply voltage reduces this advantage.

Moreover, in corner (1.2V supply voltage and  $T = 127^{\circ}C$ ) this advantage disappears.



Fig. 9. Oscillation period of 9-input indication subcircuits for 1.0V supply voltage



Fig. 10. Oscillation period of 27-input indication subcircuits for  $1.0\mathrm{V}$  supply voltage

However, multi-input H-trigger has additional advantages in comparison with traditional indication subcircuits on base of conventional H-triggers:

- Lower hardware costs (number of transistors N-input H- trigger equals to 2·(N + 2), that is identical to the formula for C-element),
- Simpler layout.

At the same time, the effectiveness of the multi-input H-triggers depends on the structure of the circuit generated indication signals to be combined. The maximum efficiency can be achieved in multiple circuits with almost simultaneous generation of bitwise indication signals: registers, parallel adders, parallel multipliers, etc. For example, the most balanced splitting multiplier 53×53 on two pipeline stages leads to appearing 598-bit intermediate register. Implementation of an indication subcircuit for this register on base of 30-input and 20-input H-triggers instead of 3-input H-triggers reduces its complexity by factor of 3.6, and

2019 IEEE EWDTS 211

accelerates it by 14% at 1.0V supply voltage and 25°C ambient temperature.

Note that simulation results were obtained for the case the input of the multi-input H-trigger driving "weak" inverter changes later than other inputs in each H-trigger. This ensures the absence of any short-circuit current, but slightly slows down its work. Otherwise, multi-input H-trigger shall demonstrate higher performance, and short-circuit current will appear which value will correspond to the data shown in Fig. 4. This current will leak until all H-trigger's inputs switch to the same state.

Thus, the usage of multi-input H-triggers is appropriate for combining large number of the indication signals generated by bits of the parallel computing units and registers.

## V. CONCLUSIONS

Indication subcircuit is a "bottleneck" of the multi-bit digital ST circuits. The need to detect the completion of the transitional processes in all elements of the ST circuit and to organize request-acknowledge interaction between ST blocks forces a developer to spend additional hardware and to slow down the circuit by forming indication subcircuit. It combines all internal indication signals into a single signal used as a control signal for preceding and subsequent blocks in the path of digital data processing.

Usage of the multi-input H-triggers in indication subcircuit of the multi-bit computing units and registers allows both for reducing hardware costs of the indication subcircuit implementation by several times, and for decreasing its delay by one and a half times, thereby increasing the performance of the entire ST circuit.

Under the typical values of the supply voltage (VDD = 1.0V), ambient temperature (T = 25°C) and model parameters of CMOS transistors ("tt" corner), 27-input H-trigger is faster (by 1.11 times) and less complex (by 3.6 times) in comparison

with similar indication subcircuit on base of 3-input H-triggers. Low voltage and low ambient temperature maximize a performance of the indication subcircuit using multi-input H-triggers.

C-element, which transistors are resized to provide the same short-circuit current during switching C-element as multi-input H-trigger has, demonstrates the worst performance in comparison with both conventional H-trigger and multi-input H-trigger.

Varying the size of transistors in the multi-input H-trigger circuit allows for shifting balance between its short-circuit current and performance in any direction/ One can accelerate H-trigger at the expense of increasing allowable short-circuit current or reduce possible short-circuit current, due to deterioration in its performance.

### REFERENCES

- [1] Y.A. Stepchenkov, A.N. Denisov, Y.G. Diachenko, F.I. Grinfeld, O.P. Filimonenko, N.V. Morozov, et al. "Functional cell library for designing self-timed semi-custom chips on gate arrays 5503/5507". Moscow: Tekhnosfera. 2017. 367 p. ISBN 978-5-94836-332-5. URL: http://www.technosphera.ru/lib/book/497p.
- [2] M. Kishinevsky, A. Kondratyev, A. Taubin, and V. Varshavsky. Concurrent hardware: the theory and practice of self-timed design, New York: J.Wiley & Sons, 1994, 368 p.
- [3] V.B. Marakhovsky, "Theory of the logic design. Course slides," http://elib.spbstu.ru/dl/1945.pdf/download/1945.pdf. In Russian (last accepted date 17.05.2019).
- [4] IEEE Computer Society. 2008. IEEE Standard for Floating-Point Arithmetic IEEE Std 754-2008. doi:10.1109/IEEESTD.2008.4610935.
- [5] Y.A. Stepchenkov, Y.G. Diachenko, A.N. Denisov, and Y.P. Fomin. H-trigger. Patent № 2371842. Registered 27.10.09. Publ. in Invention Bulletin, 2009, № 30, 13p.

212 2019 IEEE EWDTS