Y.A. Stepchenkov, Y.V. Rogdestvenski, Y.G. Diachenko, N.V. Morozov, D.Y. Stepchenkov, B.A. Stepanov, D.Y. Diachenko, A.V. Rogdestvenskene. Self-Timed Floating Point Multiply-Add Unit // Problems of Advanced Micro- and Nanoelectronic Systems Development, 2017, Part III, Moscow, IPPM RAS, P. 45-51. (is indexed in Scopus).
Abstract: The subject of this paper is a Speed-Independent Floating Point Coprocessor (SIFPC) implementing Fused Multiply-Add-Subtract operation. It utilizes mixed dual-rail and redundant self-timed coding, and is compliant with IEEE 754 Standard. SIFPC processes either one operation with double precision numbers, or two simultaneous opera-tions with single precision numbers, and calculates two re-sults: sum and difference between product of first two oper-ands and third operand. SIFPC consists of two identical channels with common input and output. An order of data outputting matches the order of an input data. Each channel implements full data processing path and has two pipeline stages: first is multiplier and exponent calculation, and se-cond is all rest parts. This reduces hardware complexity and accelerates calculations due to reducing number of interme-diate registers and cutting number of «bottlenecks» in an indication subcircuit of the unit. An additional speed-up of performance, comparing to a traditional self-timed circuit implementation, is achieved due to utilizing bit-wise and simplified (adaptive) indication. Multiplier utilizes modified Booth algorithm with Wallace tree, self-timed redundant code and ternary adders. First stage of the Wallace tree compresses four dual-rail partial products into two ternary operands. The unit is designed for standard 65-nm CMOS process. It has 1.12 mm2 die size, demonstrates 3.15 Gflops performance and 1.8 ns latency.