**Original** Article

# Efficient Design of Rounding and Leading One-bit based Approximate Multipliers using Modified Static Segment Method for Error-Tolerance Applications

D. Tilak Raju<sup>1</sup>, Y. Srinivasa Rao<sup>2</sup>

<sup>1</sup> Department of Electronics and Communication Engineering, Vignan's Institute of Engineering for Women, Andhra Pradesh, India.
<sup>2</sup> Department of Instrument Technology, Andhra University, Andhra Pradesh, India.

<sup>1</sup>Corresponding Author : tilakraju55@gmail.com

Received: 02 October 2022 Revised: 12 December 2022 Accepted: 14 January 2023 Published: 24 January 2023

**Abstract** - Multiplication is one of the premier modules in Error-Tolerance applications. In the current scenario, approximate computing is employed for the subsisted exact multipliers to maintain the trade-off between area, delay and the efficiency of the multiplier. In the literature, few methods are explored for multiplier designs to tail off the ingested energy and amplify the digital circuit's accuracy. But these designs failed to achieve efficient outcomes with accuracy when used for various applications. Hence, in this paper, in order to maintain the trade-off between design and Error Metrics (EM), leading-one bit and Rounding-based Static Segment Approximate Multipliers (LSSAM, RSSAM) are proposed along with the Modified Estimator Circuit (MEC). These designs bring about approximate multiplication using the leading unit, rounding, and barrel shifter. Furthermore, MEC is utilized to cut out the lower-order data of the input operand bit-width taken. Later, these multipliers are synthesized and simulated using software like Vivado, MATLAB, and Cadence RTL compiler, for the input bit-width extending from 8-bit to 32-bit. The obtained simulation results show that the chosen designs reduce the Design Metrics (DM) like power, delay, area, and energy on an average of 68.2 %, 35.4 %, 60.1 %, and 68.5 %, respectively, and boost up the EM like MRED, NED, WCE, and MED by 49.8%, 18.8%, 36.7%, and 47.2%, respectively, compared to that of prior designs. Moreover, by including the proposed designs in the Error-Tolerance applications, the PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index Metric) are greatly alleviated.

*Keywords* - *Leading-One-Bit approach, Rounding approach, Static segment method, Modified estimator circuit, Approximate computing.* 

## **1. Introduction**

With the continual development of electronic devices, energy denigration has become one of the critical designs, particularly for handy ones like tablets, smart mobile phones, and other devices [1]. Moreover, it also needs to accomplish this reduction with as possibly the minimum amount of performance in terms of latency [1].

Generally, Digital Signal Processing (DSP) designs are important for achieving a change in Error-tolerance applications. The arithmetic logic unit is the reckoning core of these units, which performs multiplications and is employed for major arithmetic operations [2]. For any DSP design, the two essential blocks are multipliers and adders. Moreover, the filter-related Error-tolerance application's performance can be declined if the performances of multipliers and adders vapid have exorbitant energy, overhead area or poor precision. Depending on the stochastic scientific model, a filter design is fed to shorten the faults mixed in the data. These computation errortolerance models have been busy maintaining a trade-off between accuracy and consumed energy [3].

Faulty multiplications and additions have been effectively omitted from DSP architectures to compromise the accuracy and to increase the speed, power, and area. Many studies on multiplier design have been carried out to improve energy competency by compromising VLSI designs fidelity by utilizing- extremely shorter LSB lengths, voltage scaling, and confining the building blocks (faulty) [4]-[12]. The latest designs of DSP focus on mixing and multiplying the circuits with balanced achievement. The four essential design requirements for a balanced performance are high energy and area efficiencies, excessive speed, and maximum computational accuracy. The stability of the design organization is crucial in narrowing the error for improving the performance of errortolerance applications. All present Approximate Multipliers (AMs) have failed to enhance the DM and EM simultaneously.

Hence, this paper strives to propose two unique multipliers, leading-one-Bit (LB) and Rounding-based Static Segment Approximate Multipliers (LSSAM, RSSAM), along with MEC, to improve both the accuracy and DM. Primarily, high-accuracy algorithms are developed for main arithmetic units before their existence is incorporated into processor structures. Therefore, the efficient design of balanced LSSAM and RSSAM architectures was proposed and later applied in Smoothing Filter (SF), Gaussian Filter (GF), and Image Edge-Detection (IED) in error-tolerance applications. Following is the summary of this paper's contribution:

- 1) This work primarily focuses on replacing multiplication with shifter and reducing the DM values of a multiplier.
- 2) Offering a new scheme for LSSAM and RSSAM by modifying the existing static segment AM.
- 3) From the design outcomes related to 32-bit, 16-it and 8bit multipliers, with both LB and rounding approaches, it was observed that the proposed SSM provides the best DM values.
- 4) The accuracy analysis of 32-bit, 16-it and 8-bit multipliers with both LB and rounding approaches are performed, and it observed that the LSSAM and RSSAM perform the best EM values.
- 5) GF, SF, and IED applications are estimated and attained high satisfactory values for Quality Metrics (QM).

The paper is prepared in the following way: The stateof-the-art AMs are deliberated in Section 2. The proposed AM designs are detailed in Section 3. The DM analysis, error behaviour, and quality assessment of proposed AMs are produced in Section 4, and the study is concluded in Section 5.

## 2. Background

In this section, review the latest AMs are discussed. K. Y Khaing et al. introduced an AM in which the input operand width is divided into approximate and precise segments with a few MSBs and LSBs. However, accuracy is lowered due to the relinquishment of LSB and MSB widths [4]. B. Garg et al. demonstrated an AM by computing flawed multiplication products for restricted LSBs using AND-OR circuits [5], thereby increasing the AM power consumption and area with an increase in the input-operand widths. B. Garg et al. created accuracy scalable AM, incorporating multiplier and estimator logic to manufacture the defective results with greater EM and little overhead enactment [6].

Moreover, in the dynamic-range-unbiased AM, the LSB of shortened input is fixed to one because the fault caused

by the truncation mechanism in shifting the MRED is lacking to '0' [7]. The parameter Dynamic-range influences the accuracy of the suggested AM. Using a significance estimator circuit, Jothin et al. constructed a modified staticsegment AM to increase the accuracy by discarding lowerorder relevant data of input bit width. However, increasing the input bit width reduces the accuracy of created AM [8]. Bharat Garg et al. developed an LB-based AM that produced an erroneous final output which selects m-bits from a k-bit input bit-width using the LB method [9]. The AM's design and EM are dependent on the 1-bits used. It improved EM by rationally selecting m-bits depending on the LB position. The Rounding-based AM was established by B.Garg et al. In the beginning, the input bit width is rounded to the closest power of two values before it is incorrectly multiplied with limited shifters and adders [10]. The AM multipliers that were suggested reduce the execution complexity and enhance the energy capabilities. Using the rounding approach, S.Vahdat et al. established error-efficient AM, which improves design and EM, but errors increase as the bit-size of the input operands escalates [11]. With respect to the current AMs, it is concluded that they tried to improve DM but not the QM and accuracy of the prior AMs. As a result, the EM and DM are updated by the proposed RSSAM and LSSAM, which are discussed in the subsequent section.

# 3. Proposed Static Segment Approximate Multipliers

The proposed architecture is designed to empower processing accuracy and computational accuracy. Moreover, the image classification schemes and the error-tolerant circuit should be probable with high computational accuracy and worst-case error to deliver a good presentation to all users. The proposed technique is developed using two design procedures: RSSAM and LSSAM. The two LSSAM and RSSAM architectural procedures and the later proposed architecture are discussed in this sub-section.

#### 3.1. Procedure of the proposed LSSAM, and RSSAM

The process of the LSSAM and RSSAM k-bit is presented in this section. LSSAM architecture improves the efficiency aimed at complete conceivable couples related to input-operands, thereby decreasing the circuit entanglement, such as k-bit multiplier and k/2-bit multiplication. In the preferred architecture, the k/2-bit segment can be chosen for every input operand after 1 to 2 conceivable divisions. Neglecting the minimum order division and choosing the higher order division of the input operand, the required calculator logic circuit can be utilized, and the efficiency can be enhanced. Moreover, in the input operand, the higher-order section contains a large number of zeros, and lower-order sections contain a large number of ones. The flow chart of the proposed k-bit LSSAM is presented in Fig. 1.



Fig. 2 The procedure of the proposed k-bit RSSAM



Fig. 3 Proposed LSSAM Architecture

Initially, the lower-order and higher-order LB k/2-bit segments are chosen for every k-bit input size. After that, the k/2-bit segment outputs are examined and inserted in the MEC block. After which, the main output of MEC can be inserted into the LB block. Then, the barrel shifters operation can be completed by multiplying the k/2-bit MEC with the LB outcomes.

Ultimately, the ending 2k-bit essential outcome can be attained through the adjunct of the k-bit. The series of steps of the proposed k-bit RSSAM is given in Fig. 3. From Fig. 3, the lower-order and higher-order LB k/2-bit segment outcomes can be chosen for every k-bit input-size and later, the k/2-bit segment outputs can be inserted in the MEC block.

Further, the main output of the MEC can be inserted in the rounding unit. After that, the barrel-shifter operation can be completed by multiplying the k/2-bit MEC with rounding block output. In the end, the final k-bit requisite outcome can be attained through the extension of the 2k-bit.

#### 3.2. Proposed LSSAM and RSSAM Architectures

The LSSAM architecture has escalated the accuracy by grouping the existing input bit widths, thereby reducing the circuit difficulty by replacing the k-bit multiplier with the k/2-bit multiplier. The suggested structures are a combination of RSSAM and LSSAM. This structure was mainly developed to enhance the accuracy of the system.

Fig. 3 shows the preferred k-bit LSSAM architecture. The k/2-bit division for every input width can be picked after two available divisions. The segment values are inserted into the MEC block, and one of the outputs of the MEC block is inserted into the LB block. It is identified as the LB position, and the LB mathematical expression is shown in equation 1 [9].

$$A_{LB} = \left(\prod_{k=i+1}^{m-2} \overline{A(k)}\right) \bullet A(k) \text{ for } 0 \le k \le m-2 \quad (1)$$
  
Where  $A_{LB} = m$ -bit  $A$ .

The final k-bit multiplication uses a barrel-shifter, k/2bit segment, and LB value. The 2k-bit output produced by the multiplier may be extended to a k-bit output by adding zeros. Once all the input operand sections are combined next to the lower k-bit segments, the Multiplexer (MUX) indicates Z1. If at all the combined sections of input, operands are next to the upper ones, then Z3 is chosen by the MUX. Therefore, the proposed LSSAM slumps the power, complexity of the circuit, and delay.

The design of the considered RSSAM is represented in Fig. 4. The suggested RSSAM is similar to LSSAM, but the only difference is that the LB block is replaced with the rounding block. The following equation (2) is calculated by rounding one input width to the nearby power of two [12]. The proposed RSSAM has boosted the accuracy and quality measures.

$$A_{rd}[k] = (\overline{A[k]}.A[k-1].A[k-2] + A[k].\overline{A[k]}.\overline{A[k-2]}), \prod_{m=1}^{\substack{k=k+1\\m=1}}\overline{A[k]}.$$
 (2)

Where  $A_{rd}[k] = m$ -bit of A.

The combination of RSSAM and LSSAM processes is utilized to reduce the error of properly selecting the inputs which are to be ignored. Initially, the lower-order and higher-order LB k/2-bit segments can be chosen for every kbit input width, and next, the outputs of the k/2-bit segment are considered and inserted in the MEC block. Furthermore, the main output of MEC can be given to the LB unit. Then, the barrel shifter operation can be done by multiplying the k/2-bit MEC along with LB outcomes. Ultimately, the end 2k-bit requisite result can be attained through the extension of the k- bit.

#### 3.2.1. Process of the MEC

The MEC unit's primary goal is to increase the proposed RSSAM and LSSAM's accuracy compared to existing AMs. The MEC is the main component for designing the multiplier in the proposed structure. This multiplier process is explained in this section. The MEC architecture is pictured in Fig. 5, and its logic equation is given in Table 1. The methodology specifies that the output of MEC can be R1 or Binary-to-Excess-Four Code (BEFC). The outcome, R1 + 4, can be determined through the CA and R1 signals.



Fig. 4 Proposed RSSAM Architecture



Fig. 5 Architecture of MEC

Based on the CA and carry output, the output is designed in the MEC. For example, 0 and X is a CA, and RI means the output SI value is RI. The second condition is 0, X is a CA, and RI means the output SI is an RI value. Finally, the third condition is 1 and 0 is a CA, and RI means the output SI is an RI+4. At last, the 1 and 1 is a CA, and RI means the output SI is an the output SI is RI. Based on the analysis, the MEC is working in the MEC. The whole architecture of the MEC is illustrated in Fig. 5.

### 4. Results and Discussion

These segments start with DM assessment of proposed AMs and are further collated with the existing AMs boosting the accuracy and quality. Succeeding this section is utilized to validate the CMOS technology with the DM. Four DMs are considered to validate the proposed approach: energy, delay, power, and area. The input bit-sizes of ranges 8-bit, 16-bit, and 32-bit are considered to evaluate the DM.

#### 4.1. Analysis of Design Metrics

In this, the DM analysis of the proposed LSSAM and RSSAM, along with conventional AMs [7]-[10] with input

widths extending from 8-bit to 32-bit, are discussed. All AMs are designed using Verilog HDL and eventually displayed using the Cadence\_RTL Compiler with a 90nm CMOS library for DM analysis. The structures and features of k-bit state-of-the-art AMs are described in Table 2, and their truncation length (TL) and rounding length (RL) is indicated. For all existing AMs, TL and RL values are constant for bit width ranging from 8-bit to 32-bit.

Table 3 below provides the DM of suggested 8-bit multipliers and state-of-the-art AMs with the area, delay, power, and energy. In Table 3, the proposed approach is analyzed with 8-bits, and it is achieved that the area is 214 µm2 and 244 µm2. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 1823 µm2, 575  $\mu m^2$ , 1250  $\mu m^2$ , 1459  $\mu m^2$ . From the analysis, the proposed technique attained the lowest area, which is an efficient outcome. In Table 3, the proposed approach is analyzed with 8-bits, and the power achieved is 0.091 mW and 0.012 mW. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 0.081 mW, 0.052mW, 0.025 mW, and 0.091 mW. From the analysis, the proposed technique attained low consumption power, which is an efficient outcome. In Table 3, the proposed approach is analyzed with 8-bits, and the delay is 2.2 ns and 2.0 ns. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 4.3 ns, 3.2 ns, 5.3 ns, and 6.6 ns. From the analysis, the proposed technique attained a low delay, which is an efficient outcome. In table 3, the proposed approach is analyzed with 8-bits, and it is achieved that the delay is 94 fJ and 194 fJ. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 381 fJ, 156 fJ, 154 fJ, and 589 fJ. From the analysis, the proposed technique attained low energy, which is an efficient outcome.

| Table 2. Structures and Features of state-of-the-art AM | Ms |
|---------------------------------------------------------|----|
|---------------------------------------------------------|----|

| AM     | Cture attacks                     | Features |    |
|--------|-----------------------------------|----------|----|
| Design | Structures                        | TL       | RL |
| AM1    | Dynamic Segment AM [7]            | m/2      |    |
| AM2    | Modified Static Segment<br>AM [8] | m/2      |    |
| AM3    | Low-Energy Truncation type AM [9] | 3        |    |
| AM4    | Rounding-based AM [10]            |          | М  |

Table 3. DM of 8-bit state-of-the-art and proposed AMs

| Metrics                        | AM <sub>1</sub><br>[7] | AM <sub>2</sub><br>[8] | AM3<br>[9] | AM4<br>[10] | LSSAM | RSSAM |
|--------------------------------|------------------------|------------------------|------------|-------------|-------|-------|
| <b>Area</b> (μm <sup>2</sup> ) | 1823                   | 575                    | 1250       | 1459        | 214   | 244   |
| Power (mW)                     | 0.081                  | 0.052                  | 0.025      | 0.091       | 0.091 | 0.012 |
| Delay (ns)                     | 4.3                    | 3.2                    | 5.3        | 6.6         | 2.2   | 2.0   |
| Energy (fJ)                    | 381                    | 156                    | 154        | 589         | 194   | 94    |

| Metrics                        | AM <sub>1</sub><br>[7] | AM <sub>2</sub><br>[8] | AM3<br>[9] | AM4<br>[10] | LSSAM | RSSAM |
|--------------------------------|------------------------|------------------------|------------|-------------|-------|-------|
| <b>Area</b> (μm <sup>2</sup> ) | 3013                   | 1940                   | 3418       | 6740        | 856   | 909   |
| Power (mW)                     | 31.2                   | 180.2                  | 190.5      | 30.2        | 46.9  | 57.2  |
| Delay (ns)                     | 5.2                    | 7.2                    | 4.7        | 8.6         | 4.6   | 5.6   |
| Energy (fJ)                    | 402                    | 1093                   | 2891       | 1283        | 215   | 320   |

Table 4. DM s of 16-bit state-of-the-art and proposed AMs

|                         | Table 5. DW s of 52-bit state-of-the-art and proposed AMs |            |            |             |       |       |  |
|-------------------------|-----------------------------------------------------------|------------|------------|-------------|-------|-------|--|
| Metrics                 | AM <sub>1</sub><br>[7]                                    | AM2<br>[8] | AM3<br>[9] | AM4<br>[10] | LSSAM | RSSAM |  |
| Area (µm <sup>2</sup> ) | 5139                                                      | 5729       | 8929       | 9859        | 1401  | 1609  |  |
| Power (mW)              | 82.2                                                      | 794.4      | 631.2      | 82.5        | 72.3  | 56.5  |  |
| Delay (ns)              | 5.6                                                       | 14.0       | 7.0        | 12.5        | 5.8   | 8.0   |  |
| Energy (fJ)             | 460                                                       | 1116       | 4418       | 1031        | 419   | 452   |  |

## Table 5. DM s of 32-bit state-of-the-art and proposed AMs

From Table 3, it can be seen that,

- 1. The preferred 8-bit LSSAM and RSSAM use reduced energy with respect to existing AMs.
- 2. The preferred 8-bit LSSAM and RSSAM declined area, delay, and power by a mean of 82.4%, 65.6%, and 82.7% over other AM algorithms.

In Table 4, the proposed approach is analyzed with 16bit, and it is achieved that the area is 909  $\mu m^2$  and 856  $\mu m^2$ .

The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 3013  $\mu m^2$ , 1940  $\mu m^2$ , 3418  $\mu m^2$ , and 6740  $\mu m^2$ . From the analysis, the proposed technique attained the lowest area, which is an efficient outcome. Table 4 analyses the proposed approach with 16bit, and the power achieved is 57.2 mW and 46.9 mW. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 31.2 mW, 180.2 mW, 190.50 mW, and 30.2 mW. From the analysis, the proposed technique attained low consumption power, which is an efficient outcome. In Table 4, the proposed approach is analyzed with 16-bit, and it is achieved that the delay is 5.6 ns and 4.6 ns. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 5.2 ns, 7.2 ns, 4.7 ns, and 8.6 ns. From the analysis, the proposed technique attained a low delay, which is an efficient outcome. In table 4, the proposed approach is analyzed with 16-bit, and it is achieved that the delay is 320 fJ and 215 fJ. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 405 fJ, 1093 fJ, 2891 fJ, and 1283 fJ. From the analysis, the proposed technique attained low energy, which is an efficient outcome.

Table 4 shows the DM of the 16-bit proposed in addition to conventional AMs in connection with delay, area, energy, and power. From Table IV, it can be experiential that,

- 1. The energy of the proposed 16-bit LSSAM and RSSAM is reduced to state-of-the-art AM.
- 2. Furthermore, compared to conventional AMs, RSSAM can be improved by an average of 84.3%, 45.7%, and 74.2%.

In Table 5, the proposed approach is analyzed with 32bit, and it is achieved that the area is 1609  $\mu m^2$  and 1401  $\mu m^2$ . The conventional techniques of AM1, AM2, AM3 and AM4 achieved the area is 9859  $\mu m^2$ , 8929  $\mu m^2$ , 5729  $\mu m^2$ , and 5139 µm2. From the analysis, the proposed technique attained the lowest area, which is an efficient outcome. In table 5, the proposed approach is analyzed with 32-bit, and the power achieved is 56.5 mW and 72.3 mW. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 82.2 mW, 794.4 mW, 631.2 mW, and 82.15 mW. From the analysis, the proposed technique attained low consumption power, which is an efficient outcome. Table 5 analyses the proposed approach with 32bit, and the delay is achieved at 8.0 ns and 5.8 ns. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 5.6 ns, 14.0 ns, 7.0 ns, and 12.5 ns. From the analysis, the proposed technique attained a low delay, which is an efficient outcome. In table 5, the proposed approach is analyzed with 32-bits, and it is achieved that the delay is 452 fJ and 419 fJ. The conventional techniques of AM1, AM2, AM3 and AM4 are achieved. The area is 460 fJ, 1116 fJ, 4418 fJ, and 1031 fJ. From the analysis, the proposed technique attained low energy, which is an efficient outcome.

Table 5 reports the DM of 32-bit proposed in addition to state-of-the-art AMs, in measures of delay, area, energy, and power. From Table 5, it can be experiential that

1. The energy of the proposed 32-bit LSSAM and RSSAM is reduced compared to the conventional AM

2. Compared to conventional AM, the proposed 32-bit LSSAM and RSSAM improve with reference to area,

### 4.2. Accuracy/Quality Analysis of GF/SF/IED

This division first shows accuracy analysis regarding error measures like MED, MRED, WCE, and NED, in addition to the ED of preferred LSSAM and RSSAM.

Moreover, the proposed LSSAM and RSSAM and the state-of-the-art AMs, which are calculated on the EM, are discussed in [21, 22] and are defined as

$$ED = \left| B_{Exact} - B_{Appr} \right| \tag{3}$$

where  $A_{exact}$  = the exact multiplier output and  $A_{Appr}$  = output of AM.

$$MED = \frac{1}{2^{2m}} \sum_{k=0}^{2^{2m}} |ED_k|$$
(4)

$$NED = \frac{MED}{B_{max}} \tag{5}$$

where  $Z_{max}$  = Maximum output of AM

$$MRED = \frac{1}{2^{2n}} \sum_{q=0}^{2^{2n}} |RED_q|$$
(6)

Where  $RED = \frac{ED}{B}$  and B = Exact output of the multiplier.

WCE is defined as the maximum approximate output of the AM on applying one lakh samples to the AM. This helps in the multiplication of more significant values. delay, and power by approximately 83.9%, 42.6%, and 72.8%, respectively.

In the end, the quality measures validation can be completed by performing measurements of SSIM and PSNR by joining the proposed LSSAM, RSSAM and state-of-theart AMs in GF, SF, and IED applications.

#### 4.2.1. Error Metrics Analysis

The preferred LSSAM and RSSAM models evaluate accuracy using error measures and are collated with stateof-the-art AMs. The AMs are customized in Verilog HDL and simulated by selecting random input patterns (nearly 1 lakh). In addition, accuracy measures can be synthesized by MATLAB. The validation methodology for producing the accuracy metrics is detailed [11].

The planned EM were tabulated in Table 6 for the preferred LSSAM, RSSAM, and previous AMs. The simulation outcomes show that the proposed AMs have minimized ED in the range of 44.2–56.1% compared to literature AMs. Moreover, it is even revealed that the LSSAM and RSSAM have enhanced the values of MRED, NED, WCE, and MED in the span of 74.2–25.3%, 23.4–14.2%, 52.1–21.2%, and 82.1–12.3%, respectively, in collation with past AMs.

| Metrics | AM <sub>1</sub><br>[7] | AM2<br>[8] | AM3<br>[9] | AM4<br>[10] | LSSAM     | RSSAM     |
|---------|------------------------|------------|------------|-------------|-----------|-----------|
| MED     | 1.60E+04               | 1089.8     | 1.30E+04   | 4.99E+03    | 6.13E+06  | 1.13E+04  |
| MRED    | 6.10E-05               | 1.05E-06   | 2.40E-05   | 5.20E-06    | 5.75E-06  | 1.99E-05  |
| NED     | 0.25                   | 0.23       | 0.21       | 0.08        | 0.2487    | 0.2452    |
| WCE     | 65378                  | 48248      | 62158      | 62950       | 24658     | 46162     |
| ED      | 1.03E+09               | 70798444   | 823636708  | 325889758   | 398837194 | 736028796 |

| Table 7. | 16-bit State | -of-the-art and | proposed     | l AMs EM |
|----------|--------------|-----------------|--------------|----------|
|          |              |                 | F- of one of |          |

| Metrics | AM <sub>1</sub><br>[7] | AM <sub>2</sub><br>[8] | AM3<br>[9] | AM4<br>[10] | LSSAM     | RSSAM    |
|---------|------------------------|------------------------|------------|-------------|-----------|----------|
| MED     | 5.10E+02               | 2.80E+03               | 5.01E+05   | 6.60E+06    | 5274.2    | 2.04E+09 |
| MRED    | 1.19E-07               | 7.90E-07               | 3.69E-06   | 3.79E-06    | 5.20E-06  | 6.40E-04 |
| NED     | 0.16                   | 0.14                   | 1.20E-04   | 0.20        | 0.3477    | 4.91E-01 |
| WCE     | 53499                  | 22421                  | 4.30E+09   | 3.80E+09    | 15181     | 41630    |
| ED      | 130116334              | 718644560              | 1.29E+11   | 1.69E+12    | 339954162 | 1.33E+09 |

|--|

| Metrics | AM <sub>1</sub><br>[7] | AM <sub>2</sub><br>[8] | AM3<br>[9] | AM4<br>[10] | LSSAM     | RSSAM     |
|---------|------------------------|------------------------|------------|-------------|-----------|-----------|
| MED     | 1.10E+04               | 2.70E+06               | 6.79E+05   | 1.30E+07    | 3.46E+03  | 9.33E+03  |
| MRED    | 2.80E-05               | 3.80E-06               | 3.79E-06   | 3.79E-05    | 4.77E-06  | 2.28E-05  |
| NED     | 0.19                   | 6.19E-04               | 1.60E-04   | 0.30        | 0.2283    | 0.2221    |
| WCE     | 47950                  | 4.29E+09               | 4.39E+09   | 4.40E+08    | 15138     | 42036     |
| ED      | 2.59E+09               | 6.99E+11               | 1.80E+11   | 3.29E+12    | 224707961 | 606801018 |

#### 4.2.2. Quality Metrics Analysis of GF/SF/IED

In this section, GF, SF, and IED, along with proposed and existing 8-bit AMs, can also be replicated and compared to standard images aimed at quality measure analysis with reference to PSNR and SSIM [23]. The pixel can be generated in GF, SF, and IED by executing convolution among the image of input [24], sub-matrix, and the standard masks (Z1, Z2, and Z3) and is illustrated in equations (7), (8), and (9) [25, 26].

$$Z_{1} = \begin{bmatrix} 1 & 3 & 6 & 3 & 1 \\ 3 & 15 & 25 & 15 & 3 \\ 6 & 25 & 41 & 25 & 6 \\ 3 & 15 & 25 & 15 & 3 \\ 1 & 3 & 6 & 3 & 1 \end{bmatrix}$$
(7)

$$Z_2 = \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 4 & 4 & 4 & 4 \\ 1 & 4 & 12 & 4 & 7 \\ 1 & 4 & 4 & 4 & 4 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix}$$
(8)

$$Z_{3} = \begin{bmatrix} 1 & 2 & 0 & -2 & -1 \\ 4 & 8 & 0 & -8 & -4 \\ 6 & 12 & 0 & -12 & -6 \\ 4 & 8 & 0 & -8 & -4 \\ 1 & 2 & 0 & -2 & -1 \end{bmatrix}$$
(9)

The efficiency of GF, SF, and IED amalgamated with LSSAM, RSSAM, and state-of-the-art AMs is estimated using QM. These AMs, GF, SF, and IED are synthesized and simulated using standard images. The experimental method of quality measures can be illustrated [11].

Table 9. GF QM of proposed and state-of-the-art AMs

| Metrics | AM <sub>1</sub><br>[7] | AM <sub>2</sub><br>[8] | AM <sub>3</sub><br>[9] | AM <sub>4</sub><br>[10] | LSSAM | RSSAM |
|---------|------------------------|------------------------|------------------------|-------------------------|-------|-------|
| PSNR    | 27.8                   | 27.5                   | 26.9                   | 27.6                    | 30.1  | 30.2  |
| SSIM    | 0.72                   | 0.73                   | 0.72                   | 0.71                    | 0.80  | 0.81  |

Table 9 shows the extracted QM of GF together with preferred and state-of-the-art AMs. The computed results demonstrated that, when GF is integrated with LSSAM and RSSAM, improved SSIM and PSNR in the span of 8.1-12.1 % and 4.8-10.8 % are observed over GF with state-of-the-art AMs. The Cameraman's images of the GF with all AMs are depicted in Fig. 6. The GF with designed LSSAM and RSSAM provides maximum image quality than the GF with antique AMs.

| Table 10. SF QM of prop | posed and state-of-the-art AMs |
|-------------------------|--------------------------------|
|-------------------------|--------------------------------|

| Metrics | AM <sub>1</sub><br>[7] | AM <sub>2</sub><br>[8] | AM <sub>3</sub><br>[9] | AM <sub>4</sub><br>[10] | LSSAM | RSSAM |
|---------|------------------------|------------------------|------------------------|-------------------------|-------|-------|
| PSNR    | 28.9                   | 28.8                   | 27.9                   | 27.8                    | 29.2  | 29.3  |
| SSIM    | 0.73                   | 0.72                   | 0.72                   | 0.73                    | 0.81  | 0.82  |

Table 10 depicts the reclaimed QM of SF combined with suggested and conventional AMs. The simulation results derived that SF with the preferred LSSAM and RSSAM has escalated SSIM and PSNR in the span of 10.1–12.3% and 4.4–7.1%, respectively, over the SF with state-of-the-art AMs. The House images of the SF with all AMs are illustrated in Fig. 8. The image quality of SF with proposed LSSAM and RSSAM is better than that of the SF with traditional AMs.





(a)

(b)





Fig. 7 Cameraman image by GF with: (a) AM1 [7], (b) AM2 [8], (c) AM3 [9], (d) AM4 [10], (e) LSSAM, and (f) RSSAM



(b)

(a)

(c)



Fig. 8 House image by SF with: (a) AM1 [7], (b) AM2 [8], (c) AM3 [9], (d) AM4 [10], (e) LSSAM, and (f) RSSAM

| Table 11. IED ( | )M of pro | posed and state-of-the-art AMs |
|-----------------|-----------|--------------------------------|
|-----------------|-----------|--------------------------------|

| Metrics | $AM_1$ | $AM_2$ | $AM_3$ | $AM_4$ | LSS  | RSS  |
|---------|--------|--------|--------|--------|------|------|
|         | [7]    | [8]    | [9]    | [10]   | AM   | AM   |
| PSNR    | 21.0   | 20.5   | 21.3   | 20.2   | 24.2 | 24.1 |
| SSIM    | 0.42   | 0.43   | 0.43   | 0.41   | 0.52 | 0.51 |

Table 11 portrays the quality parameters extracted from IED combined with suggested and conventional AMs. The

simulation results illustrate that IED combined with the preferred LSSAM and RSSAM has improved SSIM and PSNR in the range of 17.5–18.7% and 17.1-18.3%, respectively, over the IED with state-of-the-art AMs. The Leena images of the IED with all AMs are shown in Fig. 9. The IED with proposed LSSAM and RSSAM has improved the image quality over the IED with state-of-the-art AMs.



Fig. 9 Leena image by IED with: (a) AM<sub>1</sub> [7], (b) AM<sub>2</sub> [8], (c) AM<sub>3</sub> [9], (d) AM<sub>4</sub> [10], (e) LSSAM, and (f) RSSAM

## 5. Conclusion

This paper explores leading one-bit and rounding-based static segment AMs (LSSAM, RSSAM). In LSSAM, the kbit static segment can be recognized as related to the LB position. Therefore, one static output of k-bit can be bestowed to MEC; one MEC output is given to the LB block along with the barrel shifter for acquiring segmented multiplication output. Eventually, the multiplexer selected the approximate product by using segmented multiplication output. Similarly, RSSAM performs a multiplier function in which the LB block replaces a rounding block, and the remaining operation is the same as LSSAM. LSSAM and RSSAM have magnified area, delay, power, and accuracy metrics compared to existing AMs. In conclusion, the GF, SF, and IED with proposed LSSAM and RSSAM are tested for QM. It is transparent from the results that the SF, GF, and IED with the proposed LSSAM and RSSAM have boosted up the QM other than the SF, GF, and IED combined with existing AMs.

## References

- Massimo Alioto, "Ultra-Low Power VLSI Circuit Design Demystified and Explained: A Tutorial," *IEEE Transaction Circuits Systems I, Regular Papers*, vol. 59, no. 1, pp. 3–29, 2012. Crossref, http://doi.org/10.1109/TCSI.2011.2177004
- [2] Vaibhav Gupta et al., "Low-Power Digital Signal Processing Using Approximate Adders," IEEE Transaction Computer-Aided Design Integrated Circuits Systems, vol. 32, no. 1, pp. 124–137, 2013. Crossref, http://doi.org/10.1109/TCAD.2012.2217962
- [3] Madhu Vasudevan, and Chaitali Chakrabarti, "Image Processing using Approximate Data Path Units," *IEEE International Symposium* on Circuits and Systems, pp. 1544–1547, 2014. Crossref, http://doi.org/10.1109/ISCAS.2014.6865442
- [4] Khaing Yin Kyaw, Wang Ling Goh, and Kiat Seng Yeo, "Low-Power High-Speed Multiplier for Error-Tolerant Application," IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC), 2010 IEEE International Conference, Hong Kong, China, pp. 1–4, 2010. Crossref, http://doi.org/10.1109/EDSSC.2010.5713751
- [5] Bharat Garg, and G. K. Sharma, "Low Power Signal Processing via Approximate Multiplier for Error-Resilient Applications," 2016 11<sup>th</sup> International Conference on Industrial and Information Systems (ICIIS), IEEE, Roorkee, India, pp. 546–551, 2016. Crossref, http://doi.org/10.1109/ICIINFS.2016.8263000
- [6] Parag Kulkarni, Puneet Gupta, and Milos Ercegovac, "Trading Accuracy for Power with an Under Designed Multiplier Architecture," Proceedins of 2011 24<sup>th</sup> International Conference on VLSI Design, Chennai, India, pp. 346–351, 2011. Crossref, http://doi.org/10.1109/VLSID.2011.51
- [7] Bharat Garg, and G. K. Sharma, "ACM: An Energy-Efficient Accuracy Configurable Multiplier for Error-Resilient Applications," *Journal of Electronic Testing*, vol. 33, no. 4, pp. 479–489, 2017. Crossref, https://doi.org/10.1007/s10836-017-5667-8
- [8] R. Jothin, and C. Vasanthanayaki, "High-Performance Modified Static Segment Approximate Multiplier based on Significance Probability," *Journal of Electronic Testing*, vol. 34, pp. 607-614, 2018. *Crossref*, https://doi.org/10.1007/s10836-018-5748-3
- [9] Bharat Garg, Sujit Kumar Patel, and Sunil Dutt, "LoBA: A Leading One Bit Based Imprecise Multiplier for Efficient Image Processing," *Journal of Electronic Testing*, vol. 36, pp. 429–437, 2020. *Crossref*, https://doi.org/10.1007/s10836-020-05883-4
- [10] Bharat Garg, and Sujit Patel, "Reconfigurable Rounding Based Approximate Multiplier for Energy-Efficient Multimedia Applications," Wireless Personal Communications, vol. 118, no. 4, pp. 919-931, 2021. Crossref, https://doi.org/10.1007/s11277-020-08051-1
- [11] Shaghayegh Vahdat et al., "TOSAM: An Energy-Efficient Truncation and Rounding-Based Scalable Approximate Multiplier," IEEE Transaction on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 5, pp. 1161 – 1173, 2019. Crossref, https://doi.org/10.1109/TVLSI.2018.2890712
- [12] Reza Zendegani et al., "RoBa Multiplier: A Rounding-Based Approximate Multiplier for High-Speed Yet Energy-Efficient Digital Signal Processing," *IEEE Transaction Very Large Scale Integration (VLSI) Systems*, vol. 25, no. 2, pp. 393–401, 2017. Crossref, https://doi.org/10.1109/TVLSI.2016.2587696
- [13] Shaghayegh Vahdat et al., "LETAM: A Low Energy Truncation-Based Approximate Multiplier," *Computer Electrical Engineering*, vol. 63, pp. 1–17, 2017. *Crossref*, https://doi.org/10.1016/j.compeleceng.2017.08.019
- [14] Ferdos Salmanpour, Mohammad Hossein Moaiyeri, and Farnaz Sabetzadeh, "Ultra-Compact Imprecise 4:2 Compressor and Multiplier Circuits for Approximate Computing in Deep Nanoscale," *Circuits Systems Signal Processing*, vol. 40, pp. 4633–4650, 2021. *Crossref*, https://doi.org/10.1007/s00034-021-01688-8
- [15] Shravani Chandaka, and Balaji Narayanam, "Hardware Efficient Approximate Multiplier Architecture for Image Processing Applications," *Journal of Electronic Testing*, vol. 38, pp. 217-230, 2022. *Crossref*, https://doi.org/10.1007/s10836-022-06000-3
- [16] Kyung-Ju Cho et al., "Design of Low Error Fixed-Width Modified Booth Multiplier," *IEEE Transaction Very Large Scale Integration System*, vol. 12, no. 5, pp. 522–531, 2004. Crossref, https://doi.org/10.1109/TVLSI.2004.825853
- [17] Davide De Caro et al., "Fixed-Width Multipliers and Multipliers-Accumulators with Min-Max Approximation Error," IEEE Transaction Circuits Systems I, Regular Paper, vol. 60, no. 9, pp. 2375–2388, 2013. Crossref, https://doi.org/10.1109/TCSI.2013.2245252

- [18] Kartikeya Bhardwaj, Pravin S. Mane, and Jörg Henkel, "Power- and Area-Efficient Approximate Wallace Tree Multiplier for Error-Resilient Systems," *Proceedings of International Symposium Quality Electronic Design*, Santa Clara, CA, USA, pp. 263–269, 2014. *Crossref*, https://doi.org/10.1109/ISQED.2014.6783335
- [19] Abdoreza Pishvaie, Ghassem Jaberipur, and Ali Jahanian, "Improved CMOS (4:2) Compressor Designs for Parallel Multipliers," *Computer and Electrical Engineering*, vol. 38, no. 6, pp. 1703–1716, 2012. Crossref, https://doi.org/10.1016/j.compeleceng.2012.07.015
- [20] Mr.M Basha, and Mr.V Leelashyam, "32 bit×32 bit Multiprecision Razor-Based Dynamic Voltage Scaling Multiplier with Operands Scheduler," *International Journal of Engineering Trends and Technology*, vol. 50, no. 4, pp. 234-237, 2017. Crossref, https://doi.org/10.14445/22315381/IJETT-V50P238
- [21] Jinghang Liang, Jie Han, and Fabrizio Lombardi, "New Metrics for the Reliability of Approximate and Probabilistic Adders," IEEE Transaction on Computer, vol. 62, no. 9, pp. 1760–1771, 2013. Crossref, https://doi.org/10.1109/TC.2012.146
- [22] Omid Akbari et al., "RAP-CLA: A Reconfigurable Approximate Carry Look-Ahead Adder," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 65, no. 8, pp. 1089-1093, 2018. Crossref, https://doi.org/10.1109/TCSII.2016.2633307
- [23] Zhou Wang et al., "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. Crossref, https://doi.org/10.1109/TIP.2003.819861
- [24] Harley R. Myler, and Arthur R. Weeks, *The Pocket Handbook of Image Processing Algorithms in C*, Englewood Cliffs, NJ, and USA: Prentice-Hall, 2009.
- [25] Bharat Garg, and G.K.Sharma, "A Quality-Aware Energy-Scalable Gaussian Smoothing Filter for Image Processing Applications," *Microprocessors Microsystems*, vol. 45, pp. 1–9, 2016. Crossref, https://doi.org/10.1016/j.micpro.2016.02.012
- [26] Antonio Giuseppe Maria Strollo et al., "Comparison and Extension of Approximate 4-2 Compressors for Low-Power Approximate Multipliers," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 67, no. 9, pp. 3021-3034, 2020. Crossref, https://doi.org/10.1109/TCSI.2020.2988353