Optimization of Power and Area Using VLSI Implementation of MAC Unit Based on Additive Multiply Module

Optimization of Power and

The most crucial component of DSP is the MAC unit.The system's overall speed and efficiency are determined by the MAC unit.The MAC unit functions like a co-processor for such primary CPU to lighten the load.Therefore, creating a supercharged MAC unit is vital for microprocessors and DSP operations.Additionally, there is a growth in demand for electrical devices with high efficiencies, such as computers, smartphones, microcontrollers, and microprocessors.The most significant component of a MAC unit is the multiplier as shown in figure 1, which determines its efficiency.The multiplier consumes a significant extent of time delay, space, and power.
The most crucial component in planning is multipliers.The Multiplier and Accumulation units that make up a MAC unit and is responsible for math operations and its working is as shown in figure 2. There are numerous multiplication algorithms, and you can select one based on how well it performs.Modified Booth multiplier, Booth multiplier, Dadda tree multiplier, Array multiplier, AMM multiplier, and Wallace tree multiplier are some examples of different multiplication algorithms.The AMM, Booth, and Dadda multipliers are used in place of the multiplier block in this MAC unit.The basic full adder is what the adder block consists of.To compare the three parameters, namely power, area, and delay, the synthesis findings are carried out for both 8-bit and 16-bit MAC units.

░ 2. LITERATURE SURVEY
This study offers a thorough and unbiased analysis of the MAC Unit implementation utilizing various multipliers.Array multiplier and Booth multiplier are some of the several multipliers that have been studied in Table [1], and the data flow diagram of Booth multiplier is as shown in figure 3.

Power Utilization
High Medium

Implement
Easy Difficult  The MAC unit's simulation result in Xilinx with EDA play area system is shown in figure 4.
░  When two 128-bit numbers, x, and y, are assumed, the result generates partial products and carries C and S, respectively.Two ripples carry adders are utilized when adding two numbers with a half adder.The ripple carry adder (RCA) will take as much time as n full adders since it has to wait for the sum bit before it can generate the previous carry bit.Nonetheless, the CSaA generates all of the output values simultaneously, using less time overall than ripple-carry adders to complete the computation.As a result, the last stage uses Parallel-In Parallel-Out (PIPO) as an accumulator.

AMM Multiplier
In this proposed paper, the implementation of an 8-bit with a 16-bit of MAC unit with various multipliers such as AMM, Booth Multiplier, and Dadda Multiplier was done.The Additive Multiple Modules get additional inputs by append them for the input operands product.A 4x2 AMM performs the arithmetic operation which is p = ax+y+z.In this case, (a) and (b) each represent the 4-bit multiplicand as well as a 2-bit multiplier.As 4+2=6, the product p consists of 6 bits.Four bits make up z.

DADDA Multiplier
It builds up the total of partial products using various full and half adders.While conceptually identical to a Wallace tree multiplier, this version benefits from fewer gates and marginally improved performance due to a revised reduction tree.Figure 8 shows the various stages that go on in a Dadda multiplier to get the desired output.
Dadda Multiplier uses the following steps to generate output: • Multiply every bit of a1, by every single bit of a2, resulting in Voltage1 (V1) results.• When using full and half adders, reduce the amount of the partial products as in every stage until there are nearly no bits remaining for each weight.• The addition of the final result is done using a traditional adder.

Booth Multiplier
The general block diagram of Booth multiplier is as shown in Figure 9. Two signed binary numbers are multiplied using the two's complement notation via Booth's multiplication algorithm as shown in Figure 10.Although the multiplicand and product representations are often both in two's complement representation, any integer that supports subtraction and addition will work as well, it is not necessary to choose one.There are numerous modifications and improvements on certain specifics.The functionality is frequently explained as transforming strings of 1's into high-order +1's and low-order 1's at the endpoints of the string in the multiplier.The net impact is regarded as a negative of the proper value when a string passes through the MSB because there is no high-order+1.

Full Adder with XOR
The adder known as a "full adder" adds three inputs and generates two outputs.A and B make up the first two inputs and Cin is the third input.The normal output is denoted as S, which is the sum, while the output carry is designated as Cout as shown in figure 11.Eight bits can be used to form a single-byte adder using full adder logic, and the carry bit can be cascaded from one adder to the next.A full adder is employed since a 1-bit half-adder cannot use a carry-in bit when one is available, therefore another 1-bit adder must be used.Three operands are added via a 1-bit complete adder, which produces 2-bit results.

Accumulator Unit
The accumulator, which is a register, is where the products' total is kept.It is commonly employed in MAC units and ALUs (Arithmetic Logic Units).It may be unnecessary to perform additional summing operations by saving values in the accumulator.The delay time of the accumulator should be swift enough to keep up with fast adders.An accumulator is often used as a register to hold interim logical or numerical data in multistep calculations.The block diagram of a register unit that serves as an accumulator is shown in Figure 12.It acts as a temporary repository for these calculations.

Figure 12: Accumulator Unit
It acts as a short-term repository for these calculations.The value is gradually overwritten to hold the interim results each time one of these actions is carried out.For example, in an operation that requires adding many numbers, the accumulator would initially store the result of adding the first two integers.
After the subsequent number is added, the net result then replaces the previous result in the accumulator.This procedure is repeated until the total amount is available and all the numbers have been added.This total is calculated and written to the main memory or another register afterward.░ The 16-Bit MAC Unit synthesis findings are reported in Table 7, and it is clear that the 16-Bit MAC unit of AMM Multiplier provides a better design in terms of time delay and area.From the results obtained, the conclusions are made that a MAC unit with an AMM multiplier gives a faster execution speed it can also be noticed that with an increase in the number of bits the area and delay significantly reduce that giving us a highspeed MAC unit.

░ 5. CONCLUSION
The proposed work designs a modified MAC unit using the AMM multiplier and compares it with the Dadda and Booth multiplier for 8-bit and 16-bit and found that the MAC with AMM multiplier is better in terms of area, for 8-bit and is reduced by 3.4% compared to the Booth multiplier, and 5.2% with respect to Dadda multiplier.The delay is reduced by 37.2% in Booth multiplier and 33.3% in Dadda multiplier respectively.Similarly, for a 16-bit MAC multiplier, the area is reduced by 71% in comparison to the Booth multiplier and 71% for the Dadda multiplier.The delay is reduced by 77.9% and 76.3% respectively.These findings demonstrate that when bit count increases, delay and area decrease with AMM compared to conventional multipliers, making bit count ideal for high-speed DSP applications.

Figure 1 :
Figure 1: MAC unit basic block diagram

Figure 2 :
Figure 2: Working of MAC unit

Figure 3 :
Figure 3: The data flow of a 64-bit Booth multiplier

Figure 9 :
Figure 9: Block diagram of Booth Multiplier

Figure 13 :
Figure 13: Simulation results of 8-bit MAC unit using AMM Multiplier The following are the synthesis reports generated for an 8-bit MAC unit that has an AMM Multiplier implemented in it.The results of a simulation of 8-bit MAC unit with an AMM multiplier are shown in figure 13.The MAC unit's power, delay, and area results are depicted in figures 14, 15, and 16.

Figure 24 :
Figure 24: Area synthesis for 16-Bit MAC unit with Booth Multiplier

Figure 33 :
Figure 33: Power synthesis for 16-bit MAC unit with Dadda multiplier

Table 2
shows the device utilization of MAC Unit design using various multipliers.The number of used occupied slices is the least, the number of used slices LUT is the highest, and the number of bonded IOBS shown as 86%.

Table 3 : Delay and Power Results Size 32-bit 64-bit Power (mW)
The size and delay values of an 8-bit MAC unit that has a variety of adders and multipliers implemented are shown in TableIV.The Dadda Multiplier, Wallace Multiplier, and Modified Booth Algorithm are the multipliers that were used in the comparative analysis.The study employed three different adders: (i) Carry Look Ahead adder (ii) Carry Select Adder (CSeA) and (iii) Carry Save Adder (CSaA).The graphs are aligned with various 8-bit MAC unit types.Website: www.ijeer.forexjournal.co.inOptimization of Power and Area Using VLSI ░

Table 6
shows the synthesis results for the MAC Unit with 8-Bit has been summarized and it is seen that the MAC unit with 8-bit AMM Multiplier has a better design based on Area and Time Delay. ░