# 32 bit×32 bit Multiprecision Razor-Based Dynamic Voltage Scaling Multiplier with Operands Scheduler

Mr.M Basha<sup>1</sup>, Mr.V Leelashyam<sup>2</sup>

Asst.Prof. & ECE Department & Anurag Engineering College Ananthagiri(V&M), Suryapet(Dt)

#### Abstract

Multiplication is the basic arithmetic operation. In DSP (digital signal processing) a lot of operations require the use of arithmetic multiplications. The performance of 3D computer graphics, gaming, Embedded systems, DSP etc, are particularly depends on the performance of multiplication steps. Multipliers have more area, long latency and consume high amount of power. Critical factors in the design of multipliers are chip area and speed of multiplication and require less hardware. Scaling of technology node increases power-density more than expected. This paper is focused on Multi Precision (MP) reconfigurable multiplier combined with various precision methods, parallel processing (PP), razor-based dynamic voltage scaling (DVS), and MP operands scheduling to give optimum level of performance for various operating conditions. Adapting to the run-time workload of the targeted application, razor flip-flops combine with a dithering voltage unit, because of this the multiplier is able to achieve the lowest power consumption. Use of single switch dithering voltage unit and razor flip-flops help to minimize the safety margins in voltage and overhead in DVS. The more amount of silicon area and power requirements are reduced because of reconfigurable structures.

**Keywords** —Multiplier, Razor Flip Flops, Operand Scheduler, Multiplicand, Multi precision, Dynamic Voltage Scaling

### I. INTRODUCTION

Consumers demand for increasingly portable yet high performance multimedia and communication products imposes stringent constraints on the power consumption of individual internal components Of these, multipliers perform one of the most frequently arithmetic This spurious switching activity can be mitigated by balancing internal paths through combination of architectural and transistor-level optimization techniques . In addition to internal path delays, dynamic power reduction can also be achieved by monitoring the effective dynamic range of the input operands so as to disable unused sections of the multiplier truncate the output product at the cost of reduced precision. This is possible because, in most sensor applications, the actual inputs do not always occupy the entire magnitude of its wordlength.

Section II presents the existing system Section III presents the operation and architecture of the proposed MP multiplier. Section IV presents experimental results. Section V presents the Proposed System. Section VI presents Future Work. Finally, a conclusion is given in Section VII.

### **II. EXISTING SYSTEM**

Today's full-custom DSPs and application specific integrated circuits (ASICs) are designed for a fixed maximum word-length so as to accommodate the worst case scenario. Therefore, an 8-bit multiplication computed on a 32-bit Booth multiplier would result in unnecessary switching activity and power loss. Several works investigated this word length optimization. Each pair of incoming operands is routed to the smallest multiplier that can compute the result to take advantage of the lower energy consumption of the smaller circuit. This ensemble of point systems is reported to consume the least power but this came at the cost of increased chip area given the used ensemble structure. Combining multi precision (MP) with dynamic voltage scaling (DVS) can provide a dramatic reduction in power consumption by adjusting the supply voltage according to circuit's run-time workload rather than fixing it to cater for the worst case scenario. When adjusting the voltage, the actual performance of the multiplier running under scaled voltage has to be characterized to guarantee a fail-safe operation.

#### II. SYSTEM OVERVIEW AND OPERATION

Multiplication structure consists of 5 basic blocks to their operations.

1) MP multiplier.

2) Input Operands Scheduler (IOS) - Used to reorder the input data into a buffer, to reduce the required power supply voltage transitions,

3) Frequency Scaling Unit (FSU)- implemented using a Voltage Controlled Oscillator (VCO) - used to generate the required operating frequency of the multiplier; 4) Voltage Scaling Unit (VSU) implemented using a voltage .Its function is to dynamically generate the supply voltage so as to minimize power consumption.
5) Dynamic Voltage/Frequency Management Unit (VFMU) - receives the user requirements (e.g., throughput). VFMU sends control signals to the VSU and FSU.



Fig 1: Overall multiplier system architecture.

To generate the required power supply voltage and clock frequency for the MP multiplier. If the razor flip flops of the multiplier do not report any errors, this means that the supply voltage can be reduced. This is achieved through the VFMU, which sends control signals to the VSU, hence to lower the supply voltage level. When the feedback provided by the razor flip flops indicates timing errors, the scaling of the power supply is stopped. Proposed Multiplier not only combined Multi Precision & Parallel Processing, and also combines DVS with operand scheduling technique. PP can be used to increase the throughput or reduce the supply voltage level for low power operation.



# Fig. 2. Possible configuration modes of proposed MP multiplier

A dynamic power supply and a VCO are employed to achieve real-time dynamic voltage and frequency scaling under various operating conditions, near optimal dynamic voltage scaling can be achieved when using voltage dithering, which exhibits faster response time than conventional voltage regulators. Voltage dithering uses power switches to connect different supply voltages to the load, depending on the time slots.

The razor technology is a breakthrough work, which largely eliminates the safety margins by achieving variable tolerance through in-situ timing error detection and correction ability. This approach is based on a razor flip-flop, which detects and corrects delay errors by double sampling. The razor flip-flop operates as a standard positive edge triggered flip-flops coupled with a shadow latch, which samples at the negative edge. Therefore, the input data is given in the duration of the positive clock phase to settle down to its correct state before being by the shadow latch. The minimum allowable supply voltage needs to be set, hence the shadow latch always clocks the correct data even for the worst case conditions.



Fig.3: 4 – Sub block Multiplier structure (Fixed Width)

To evaluate the proposed MP architecture, a conventional 32-bit fixed-width multiplier and four sub-block MP multipliers are designed using a Booth Radix-4 Wallace tree structure similar to that used for the building blocks of our MP three sub-block multiplier. Our multiplier comprises  $8 \times 8$  bit reconfigurable multipliers. These building blocks can either work as nine independent multipliers or work in parallel to perform one, two or three  $16 \times 16$  bit multiplications or a single- $32 \times 32$  bit operation. A Booth radix-4 Wallace tree structure similar to that used in designing the building blocks of our MP multipliers. However, because of its larger size, the  $32 \times 32$  bit fixed width multiplier exhibits an irregular layout with complex interconnects. This limitation of tree multipliers happens to be addressed by our MP  $32 \times 32$  bit multiplier, which uses a more regular design to partition, regroup, and sum partial products.

### **IV. SIMULATION RESULTS**



Fig 4. Simulation output when rst=1



Fig 5. Simulation output when rst=0

### V. PROPOSED SYSTEM

Only required amount of Bits could be used, remaining bits can be kept in OFF condition. This will lead to achieve optimum level power consumption during multiplication process.

A column-bypassing multiplier is an improvement on the normal array multiplier (AM). The AM is a fast parallel AM. The multiplier array consists of (n-1) rows of carry save adder (CSA), in which each row contains (n - 1) full adder (FA) cells. Each FA in the CSA array has two outputs: 1) the sum bit goes down and 2) the carry bit goes to the lower left FA. The last row is a ripple adder for carry propagation.



### Fig 6: Column-bypassing multiplier architecture

The FAs in the AM are always active regardless of input states. A low-power columnbypassing multiplier design is proposed in which the FA operations are disabled if the corresponding bit in the multiplicand is 0. Fig. 2 shows a  $4\times4$  columnbypassing multiplier. Supposing the inputs are 10102 \* 11112, it can be seen that for the FAs in the first and third diagonals, two of the three input bits are 0: the carry bit from its upper right FA and the partial product *ai bi*. Therefore, the output of the adders in both diagonals is 0, and the output sum bit is simply equal to the third bit, which is the sum output of its upper FA.

Hence, the FA is modified to add two tristate gates and one multiplexer. The multiplicand bit *ai* can be used as the selector of the multiplexer to decide the output of the FA, and *ai* can also be used as the selector of the tri state gate to turn off the input path of the FA. If *ai* is 0, the inputs of FA are disabled, and the sum bit of the current FA is equal to the sum bit from its upper FA, thus reducing the power consumption of the multiplier. If *ai* is 1, the normal sum result is selected.

## VI. FUTURE WORK

Using 3 Sub block multiplier structure, adder Positions could be changed for delay reduction. And using 4: 2 compressor techniques for Power consumption also efficiency can be increased. And this multiplier bit is used on a FIR FILTER.

## VII. CONCLUSION

Proposed a novel MP multiplier architecture featuring, respectively, 28.2% and 15.8% reduction in silicon area and power consumption compared with its  $32 \times 32$  bit conventional fixed-width multiplier counterpart. When integrating this MP multiplier architecture with an error.

#### REFERENCES

- [1] R. Min, M. Bhardwaj, S.-H. Cho, N. Ickes, E. Shih, A. Sinha, A. Wang, and A. Chandrakasan, "Energy-centric enabling technologies for wireless sensor networks," *IEEE Wirel. Commun.*, vol. 9, no. 4, pp. 28–39, Aug. 2002.
- [2] M. Bhardwaj, R. Min, and A. Chandrakasan, "Quantifying and enhancing power awareness of VLSI systems," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 9, no. 6, pp. 757–772, Dec. 2001.
- [3] A. Wang and A. Chandrakasan, "Energy-aware architectures for a realvalued FFT implementation," in *Proc. IEEE Int. Symp. Low Power Electron. Design*, Aug. 2003, pp. 360–365.
- [4] T. Kuroda, "Low power CMOS digital design for multimedia processors," in *Proc. Int. Conf. VLSI CAD*, Oct. 1999, pp. 359–367.
- [5] H. Lee, "A power-aware scalable pipelined booth multiplier," in *Proc. IEEE Int. SOC Conf.*, Sep. 2004, pp. 123–126.
- [6] S.-R. Kuang and J.-P. Wang, "Design of power-efficient configurable booth multiplier," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 57, no. 3, pp. 568–580, Mar. 2010.
- [7] O. A. Pfander, R. Hacker, and H.-J. Pfleiderer, "A multiplexer-based concept for reconfigurable multiplier arrays," in *Proc. Int. Conf. Field Program. Logic Appl.*, vol. 3203. Sep. 2004, pp. 938–942.
- [8] F. Carbognani, F. Buergin, N. Felber, H. Kaeslin, and W. Fichtner, "Transmission gates combined with level-restoring CMOS gates reduce glitches in low-power low-frequency multipliers," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 7, pp. 830–836, Jul. 2008.
- [9] T. Yamanaka and V. G. Moshnyaga, "Reducing multiplier energy by data-driven voltage variation," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2004, pp. 285–288.
- [10] W. Ling and Y. Savaria, "Variable-precision multiplier for equalizer with adaptive modulation," in *Proc. 47th Midwest Symp. Circuits Syst.*, vol. 1. Jul. 2004, pp. I-553–I-556.
- [11] K.-S. Chong, B.-H. Gwee, and J. S. Chang, "A micropower low-voltage multiplier with reduced spurious switching," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 13, no. 2, pp. 255–265, Feb. 2005.
- [12] M. Sjalander, M. Drazdziulis, P. Larsson-Edefors, and H. Eriksson, "A low-leakage twin-precision multiplier using reconfigurable power gating," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2005, pp. 1654–1657.
- [13] S.-R. Kuang and J.-P. Wang, "Design of power-efficient pipelined truncated multipliers with various output precision," *IET Comput. Digital Tech.*, vol. 1, no. 2, pp. 129– 136, Mar. 2007.
- [14] J. L. Holt and J.-N. Hwang, "Finite precision error analysis of neural network hardware implementations," *IEEE Trans. Comput.*, vol. 42, no. 3, pp. 281–290, Mar. 1993.
- [15] A. Bermak, D. Martinez, and J.-L. Noullet, "High-density 16/8/4-bit configurable multiplier," *Proc. Inst. Electr. Eng. Circuits Devices Syst.*, vol. 144, no. 5, pp. 272–276, Oct. 1997.
- [16] T. Kuroda, "Low power CMOS digital design for multimedia processors," in *Proc. Int. Conf. VLSI CAD*, Oct. 1999, pp. 359–367.
- [17] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, "A dynamic voltage scaled microprocessor system," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1571–1580, Nov. 2000.
- [18] T. Kuroda, K. Suzuki, S. Mita, T. Fujita, F. Yamane, F. Sano, A. Chiba, Y. Watanabe, K. Matsuda, T. Maeda, T. Sakurai, and T. Furuyama, "Variable supply-voltage scheme for lowpower highspeed CMOS digital design," *IEEE J. Solid-State Circuits*, vol. 33, no. 3, pp. 454–462, Mar. 1998.