Self-timed multiplier for multiply-add unit

B. Stepanov, Y. Diachenko, Y. Rogdestvenski, D. Diachenko. Self-timed multiplier for multiply-add unit // NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW), 2016 IEEE. P. 349 – 352.  (is indexed in Scopus).

Paper discusses the peculiarities of self-timed multiplier implementation for unit multiplying two operands and then adding the product to third operand without an intermediate rounding according to the IEEE 754 Standard. The multiplier is a hardware implementation of modified Booth algorithm on a base of self-timed adder with redundant signal code. An optimal self-timed redundant coding of internal and output signals in the multiplier was proposed. The circuitry and layout problems were solved for self-timed multiplier implementation. Wallace tree structure, which is the main part of the multiplier, was optimized for the facilities of 65-nm CMOS process with six metal layers taking into account more number of signals in the multiplier circuit, than in the synchronous analog. A release of the self-timed multiplier implementation in CMOS process with 65-nm design rules is introduced

Самосинхронное устройство умножения-сложения с плавающей точкой

Ю.А. Степченков, Ю.В. Рождественский, Ю.Г. Дьяченко,  Н.В. Морозов, Д.Ю. Степченков, Б.А.Степанов, Д.Ю. Дьяченко, А.В. Рождественскене. Самосинхронное устройство умножения-сложения с плавающей точкой // Проблемы разработки перспективных микро- и наноэлектронных систем — 2016. Сборник трудов / под общ. ред. академика РАН А.Л. Стемпковского. М.: ИППМ РАН, 2016. Часть III. С. 149- 156.

Представлено самосинхронное устройство совмещенного умножения-сложения в качестве сопроцессора, относящееся к новому подклассу независящих от задержек схем с адаптивной индикацией. Оно полностью соответствует стандарту IEEE 754 и реализует одновременно операции сложения и вычитания третьего операнда из произведения первых двух. Каждый 64-разрядный операнд содержит либо одно число двойной точности, либо два числа одинарной точности. Для минимизации аппаратных затрат и энергопотребления число ступеней конвейера сокращено до двух. Оптимальное соотношение «производительность / аппа-ратные затраты» достигается за счет двух параллельных каналов с общим входом и выходом. Дополнительное сокращение аппаратных затрат и повышение быстро-действия обеспечивается использованием адаптивной индикации. Умножение реализовано на основе самосинхронного избыточного кода. Устройство разработано на базе объемной КМОП технологии с проектными норма-ми 65 нм и обеспечивает производительность на уровне 3,15 гигафлопс.

Speed-Independent Fused Multiply Add and Subtract Unit

Yuri Stepchenkov, Victor Zakharov, Yuri Rogdestvenski, Yuri Diachenko, Nikolai Morozov and Dmitri Stepchenkov. Speed-Independent Fused Multiply Add and Subtract Unit // Proceedings of IEEE East-West Design & Test Symposium (EWDTS’2016), Yerevan, October, 14 — 17, 2016. P. 150-153. (is indexed in Scopus).

DOI:  10.0.4.85/EWDTS.2016.7807735

Speed -independent fused multiply-add -subtract unit is offered together with test environment providing full verification of its performance and workability in all range of the environment conditions. It complies with IEEE 754 Standard, and performs double and single precision operations at three operands. The unit is implemented as a two-channel with a common input and output . Each channel is a pipeline with four stag-es. Multiplier is implemented on the modified Booth algorithm using self -timed redundant code. The unit was design ed on a base of standard CMOS processwith 65 nm design rules and has 3.15 Gigaflops per-formance and less than 2 ns latency

Self-timed multiplier for multiply-add unit

B. Stepanov, Y. Diachenko, Y. Rogdestvenski, D. Diachenko. Self-timed multiplier for multiply-add unit // 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (2016 ElConRusNW). February 02-03, 2016 St. Petersburg, Russia. — P 373-376. (indexed in Scopus).

Paper discusses the peculiarities of self-timed multiplier implementation for unit multiplying two operands and then adding the product to third operand without an intermediate rounding according to the IEEE 754 Standard. The multiplier is a hardware implementation of modified Booth algorithm on a base of self-timed adder with redundant signal code. An optimal self-timed redundant coding of internal and output signals in the multiplier was proposed. The circuitry and layout problems were solved for self-timed multiplier implementation. Wallace tree structure, which is the main part of the multiplier, was optimized for the facilities of 65-nm CMOS process with six metal layers taking into account more number of signals in the multiplier circuit, than in the synchronous analog. A release of the self-timed multiplier implementation in CMOS process with 65-nm design rules is introduced.

Hardware and Software Modelling and Testing of Non-Conventional Data-Flow Architecture

Yuri Shikunov, Dmitry Khilko, Yuri Stepchenkov. Hardware and Software Modelling and Testing of Non-Conventional Data-Flow Architecture // 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference. (2016 ElConRusNW). February 02-03, 2016 St. Petersburg, Russia. — P 360-364. (indexed in Scopus).

This paper covers new recurrent data-flow computational model, as well as architecture that implements principles and ideas of this model. Basic differences of this model from the existing ones and examine key aspects of this new computational model including its implementation in the form of Hybrid Recurrent Architecture of Digital Signal Processor are described. The approach and methodology of hardware and software modelling and testing based on new architecture are being proposed. We introduce the model of implementation of the proposed architecture as well as imitation modelling tools of recurrent data-flow architecture, implementing said model. Functionality of imitation model and its role in software development suite for new architecture software development is being described. We introduce the notion of the target modelling platform called GAROS IDE. The results of platform testing on several subtasks of isolated words recognition problem are presented

Самосинхронный умножитель для распределенных вычислений

Степанов Б.А., Дьяченко Ю.Г., Рождественский Ю.В. Самосинхронный умножитель для распределенных вычислений // Труды Второй молодежной научной конференции «Задачи современной информатики» – М.: ФИЦ ИУ РАН, 2015. – с. 220-227.

Доклад посвящен особенностям реализации самосинхронного умножителя для устройства умножения-сложения. Обоснован выбор оптимального самосинхронного кодирования сигналов умножителя с помощью избыточного троичного самосинхронного кода. Предложена структура дерева Уоллеса – основной части умножителя. Рассматриваются проблемы схемотехнической и топологической реализации умножителя. Представлен вариант реализации самосинхронного умножителя в КМОП технологии с проектными нормами 65 нм.

Библиотека функциональных ячеек для самосинхронных БМК-СБИС

Степченков Ю.А., Денисов А.Н., Дьяченко Ю.Г., Гаврилов С.В., Морозов Н.В., Степченков Д.Ю. Библиотека функциональных ячеек для самосинхронных БМК-СБИС / М.: Техносфера. Международная конференция «Микроэлектроника 2015» Интегральные схемы и микроэлектронные модули: проектирование, производство и применение. Сборник тезисов. Крым, г. Алушта, 28 сентября – 3 октября 2015 г. 9 с (принята к печати).

Данный доклад посвящен разработке средств проектирования и изготовления самосинхронных (CC) СБИС (всех возможных классов) на основе базовых матричных кристаллов (БМК). Обосновывается выбор БМК, как базиса для проектирования СС-БИС. Описывается состав библиотеки функциональных ячеек, обеспечивающих эффективное проектирование СС-схем различных классов на базе отечественной серии БМК 5503/5507, 5521/5528.

Functional Approach in Self-timed Circuits Design

Plekhanov Leonid, Zakharov Victor and Stepchenkov Yuri. Functional Approach in Self-timed Circuits Design // Proceedings of IEEE East-West Design & Test Symposium (EWDTS’2015), Batumi, Georgia, September 26 — 29, 2015. P. 282-285.

Self-timed circuits have the unique properties of a lack of competition. One of the main problems of such circuits design — the analysis on self-timing (elements switching) and the construction of large circuits. In traditional approach computational complexity is so great, that it does not allow to analyze the most important practical circuits. In the functional approach we propose hierarchical method: on the lower level logic functions of elements are analyzed, on the top — the relationships between blocks. The complexity of calculations here is close to the linear function of the size of circuit. Such approach solves one of the main problems of self-timed circuits design — analysis circuits of any size. The efficiency of the proposed methods was confirmed by the developed SW.

Cell Library for Speed-Independent VLSI

Stepchenkov Yuri, Zakharov Victor, Diachenko Yuri, Morozov Nickolaj and Stepchenkov Dmitri. Cell Library for Speed-Independent VLSI // Proceedings of IEEE East-West Design & Test Symposium (EWDTS’2015), Batumi, Georgia, September 26 — 29, 2015. P. 137-140.

Paper describes content and implementation features of the cell library intended for digital self- timed (speed-independent) circuit design. The library contains more than 200 cells. Self-timed triggers with unary input and triggers with forced output are presented. The library was certified by means of developed characterization tool and was practically tested in a set of digital signal processing units manufactured in differential CMOS processes.

Speed-Independent Floating Point Coprocessor

Stepchenkov Yuri, Zakharov Victor, Rogdestvenski Yuri, Diachenko Yuri, Morozov Nickolaj and Stepchenkov Dmitri. Speed-Independent Floating Point Coprocessor // Proceedings of IEEE East-West Design & Test Symposium (EWDTS’2015), Batumi, Georgia, September 26 — 29, 2015. P. 111-114.

Speed-independent fused multiply-add unit as a coprocessor is represented. It purely conforms to IEEE 754 Standard. For minimization hardware and power consumption, a number of pipeline stages is reduced down to two. Wallace tree in the multiplier utilizes redundant self-timed code. Represented unit is developed on a base of standard 65-nm CMOS bulk process. It provides a performance up to 0.54 Gflops, and power consumption at level of 450 mW/Gflops