# Design of Efficient FSM Based 3D Network on Chip Architecture

Krutthika H.K<sup>#1</sup>, A R Aswatha<sup>\*2</sup>

<sup>#</sup>Assistant Professor, Department of Electronics and Communication Engineering, Dayananda Sagar College of Engineering, Bengaluru- 560078, Karnataka, INDIA

\*Professor & Head, Department of Electronics and Telecommunication Engineering, Dayananda Sagar College of Engineering, Bengaluru-560078, Karnataka, INDIA,

1 krutthika-ece@dayanandasagar.edu

 $^{2}\,hod\text{-tc}@dayanandasagar.edu$ 

Abstract - The 3D NoC architecture is used in general Silicon on Chip (SoC) architecture to establish bidirectional communications between different processing elements which are stacked in three dimensional arrays. In the real time implementation scenarios, congestion in the network depends upon the time taken by the specific node to route and process any explicit task. Each router must be able to detect such conditions and store the data temporarily inside the respective router for further processing. In this paper, a 3D NoC router architecture is proposed which is capable to detect the congestion and process the data efficiently. The routers are vertically stacked to obtain 3D dimensional NoC structure, which intern reduces the area requirements and increases the throughput compared to traditional NoC architectures. The data format which is used in our proposed architectures has an option field for the acknowledgement at each level of data transfer which is further modeled using novel simplified FSM technique. The switching network for 3D NoC has been designed to efficiently accommodate for routing algorithm. The entire proposed architecture is modeled using FSM technique which is coded using VHDL language and implemented on Xilinx Zybo Z7-10 FPGA board. The comparison result shows that the proposed architecture is better in-terms of hardware parameters than existing methods.

**Keywords:** *FPGA* architecture, Network On Chip, Router, System on Chip, XY Routing.

#### **I. INTRODUCTION**

The technology used in designing Integrated Circuits (ICs) are evolved to its new heights over the past few years which makes the designer to integrate more functionality in a single chip without increasing the size, power and frequency. At the same time this will introduce various design issues which are reduced by different design techniques. The technique used in SoC chip commonly known as Stack Based Approach, which is normally used to reduce the

overall chip dimensions through layered architecture, where laver achieves different functions and each by interconnecting all these functions through proper channel the chip performs the intended operations. In order to transmit data efficiently and also to obtain the correct timing, the controller block in the NoC architecture plays a significant role. If the data is not propagated efficiently, their will be high power consumption, low throughput manufacturing cost will get increases. To avoid these problems, Network on Chip (NoC) concept is implemented. The trend is now shifting towards 3D NoC architectures where, a greater number of routers are stacked vertically with the minimum area. With this 3D NoC architecture there will be steep increase in the throughput for different data transfer applications. The migration from physical connections to the packet based switching network [1, 2] is introduced to overcome the On-chip communication problems so a new embeddeding technology called Network-on-Chip (NoC) is developed. This new embedded technology has greater flexibility, scalability and parallelism than the other solutions [3]. In order to utilize the maximum design space in NoC, the SoC designers need to outline several design constraints to find out the optimal NoC configuration. The development tools like analytical models [4, 5], simulators [6] and emulators [7] are used for implementation. The cost or the performance metrics [8, 9] are generally computed using the analytical models that uses the mathematical frameworks. But, modeling the behaviour of complex SoCs designs is a challenging task and accuracy of the results is very low compared to other solutions. In the paper [10], the authors have presented simulation software for the NoC architectures. Here, simulating the higher level of abstractions are faster but, it gives less accurate outcomes. The lower level abstraction like CABA: cycle accurate bitaccurate gives accurate results, however, the simulation for the whole network like NoCs and IP cores takes more than weeks [11].

The FPGAs provides the best evaluation tools to evaluate

the performance of NoC architectures, which considerably decreases the evaluation time while keeping the rate of accuracy high [7]. The author Omprakash Ghorse et al. [12] performs has reviewed various router architectures. The comparisons between the architectures depends upon power and area consumption. Neila Moussa et al. [13] performs NoC router network comparisons with network simulators to achieve proper routing efficiency. In this research paper, we have proposed an efficient FSM based FPGA architecture to implement 3D Network on Chip to transfer the data efficiently between different processing nodes present in SoC chip.

This research paper is Organised as follows, Section 2 discusses the proposed 3D NoC architecture and methodology is dicussed in detail followed by section 3 and 4 the final inference, the FPGA implementation and its results are discussed.

# **II. PROPOSED 3D NoC ARCHITECTURE**

The proposed three-dimensional Network on Chip (NoC) architecture has dimensions (3x3x3) mesh topology is shown in Fig.1. Each box represents a router with proper local address stored in the local register of each router at the time of inserting it into the network. The router has been efficiently designed using FSM controller within the buffer architecture to improve the communication efficiency. The value of the position of the router at three dimensional Cartesian coordinates is mentioned accordingly.



Fig. 1 Three Dimensional NoC Network Architecture

The block diagram of standalone router architecture has 7-port router configurations with three-dimensional address (namely X, Y and Z) configurations are considered is shown in Fig.2. For simplicity in diagrams, two-dimensional architecture is shown in Fig.2, where 5-ports (namely East, West, North, South and Local respectively) are shown in the figure and 2-ports (namely up and down respectively) are hidden. Among these 7-ports, the "Local port" is used when the data belongs to the same router and the remaining ports are used to route the data depending upon the routing algorithm used in "Routing Logic" block along with "Input Buffer" and "Switch Control" blocks. The router is able to accept the data from all ports which are then routed to the correct port depending upon the routing address and also on priorities assigned to the packets. The use of the priorities embedded in the data format allows the router to accept simultaneous data from all ports if and only if the priorities of the packets present at a particular time are different. The acknowledgement (ACK) and request (REQ) signals are used for synchronization purpose.

Fig. 2 Proposed 3D NoC Router Architecture



## A. Data Format

The data format is used for this implementation is shown in the Figure.3, where the "rst" and "req" which are of 3bits wide are used to activate or deactivate the required path to route the data properly. The fields namely X, Y and Z are of 3bits each are used to represent the address in terms of positions in the corresponding three-dimensional coordinates and finally the data packet field is of 16bits.



#### **B.** Input Buffer

The internal architecture of "Input Buffer" is shown in the Fig.4 consists of "MUX", "Register" and "Control Unit". The data packet is stored into the "Register" block which is built using the "Shift Register" architecture [14]. The status of the "Register" block (full and/or empty) is used to check the status of the "Register" and generates the required interface signals.



Fig. 4 Input buffer Architecture for NoC Router

### C. FSM Based Control Unit

The simplified approach of Finite State Machine (FSM) model which is used to design the "Control Unit" is shown in the Fig.5. When the "rst=0", the machine goes to "ideal" mode which indicates that the node is ready to accept the data packet. When the "req=1", then the destination address is decoded using "Concatenation" operation.

Depending upon the Acknowledgement (ACK) signal, the data packet will be stored into the "Shift Register" or it sends the data packet to the corresponding port by using the efficient routing algorithm and once acknowledgement is received, the machine will reset to its ideal state. This technique of acknowledgement at each stage will make sure the data is transmitted effectively.



Fig. 5 Proposed Controller Unit Architecture

# D. Switching Network

The Switching Network block used to switch the data into the correct port depending upon the switching algorithm used. The MUX-DEMUX based switching network is used in the proposed NoC architecture to reduce the hardware utilizations and also design complexity. But the existing switching network architecture [15] is for two dimensional NoC which is modified for three-dimensional NoC architecture is shown in the Fig.6.



Architecture

In the Fig.6, the destination addresses (namely destination\_X, destination\_Y and destination\_Z) are the extracted from the data format which is shown in the Fig.3. The source addresses (namely source\_X, source\_Y and source\_Z) are the address to the corresponding router at its cartesian coordinate. The "Comparator" blocks are used to compare the destination address with the current address and depending upon the output of those comparator blocks, the "Encoder" block generates correct request signal to route the data packet.

#### **III. RESULTS AND DISCUSSIONS**

#### A. FPGA Implementation

The hardware utilizations of the proposed 3x3x3 NoC architecture is shown in Table.1 where, the utilizations of standalone router and total network are given for Digilent Zybo Z7-10 FPGA board in terms of Slice Registers, Slice LUTs and Maximum Frequency.

| Table. 1 Hardware Utilizations of the Proposed 3D No | С |
|------------------------------------------------------|---|
| Architecture                                         |   |

| Parameters                                      | Router                           | Network |  |
|-------------------------------------------------|----------------------------------|---------|--|
| FPGA                                            | Zybo Z7-10 (XC7Z010-<br>1CLG400) |         |  |
| Slice Registers                                 | 155                              | 3107    |  |
| Slice LUTs                                      | 954                              | 17085   |  |
| LUT-FF Pairs                                    | 155                              | 3103    |  |
| Maximum Frequency<br>(MHz)                      | 190.752                          | 165.627 |  |
| Minimum Period (nsec)                           | 5.242                            | 6.038   |  |
| Minimum input arrival time before clock (nsec)  | 2.030                            | 2.226   |  |
| Maximum output required time after clock (nsec) | 3.958                            | 4.331   |  |

The generated top level RTL schematics of the proposed 3D NoC router is shown in the Fig.7, which is generated after synthesizing the proposed architecture using Xilinx ISE 14.5 tool. The Fig.8 represent Elaborated RTL Schematic view respectively.



Fig. 7 Generated Top Level RTL Schematic view of NoC Router



Fig. 8 Elaborated RTL Schematic view

# **B.** Simulation timings

The simulation timings of standalone 3D NoC router is shown in the Fig.9. Here, each process takes single clock cycle to perform the task efficiently. Totally, 6 clock cycle are required to perform the routing operation. The data packet arrives in one of the Source port, based on the availability of the destination port, the packet will be either stored or sent to the destination port. The propagation of the data packet is based on the logic implemented using FSM model and routing algorithm.



Fig. 9 Simulation timings of NoC router

The standalone router takes 6-clock cycles to process the data packet. The path chosen by the packet with respect to the neighbouring routers are different for different conditions. The calculation of a specific time required to transfer any data packet through a path is complex. The equations (1) and (2) are used to calculate the delay for single router is given as,

$$t_{NoC} = \sum_{i=1}^{n} t_{Buffer} + t_{Controller} + t_{Switching} + t_{Wire} \quad (1)$$

Where,  $t_{Buffer}$  is the buffer delay.

 $t_{Controller}$  is the Controller delay.  $t_{Switching}$  is the Switching delay.  $t_{Wire}$  is the wire delay. n is the number of the critical path.

The total delay between the two nodes is calculated using the equation (2),

$$path\_delay_{NoC} = \sum_{k=1}^{n} p. t_{NoC} + t_{delay}$$
(2)

Where, *p* is the number of NoC routers used in the path and *tdelay* is the total wire delay at between the routers.

The simulation result of the proposed 3D architecture for different type of operations using data packets with diverse destination addresses are as follows:

# Case 1:

In this case, the data with destination address related to the East port is given to a router which routes the data to east port. The simulation snapshot of this scenario is shown in Fig.10.



Fig. 10 Simulation result of NoC router for East port routing operation

Case 2:

In this case, the data with destination address related to the UP port is given to a router which, routes the data to east port. The simulation snapshot of this scenario is shown in Fig.11.

|   |                               |          |                  |                      | 30.255 ns       |                    |                    |     |
|---|-------------------------------|----------|------------------|----------------------|-----------------|--------------------|--------------------|-----|
| N | ame                           | Value    |                  | 25 ns                | 30 ns           | 35 ns              | 40 ns              | 45  |
|   | $\mathbb{U}_{\mathbb{Q}}$ clk | 0        |                  |                      |                 |                    |                    |     |
|   | Ug rst                        | 0        |                  |                      |                 |                    |                    |     |
| Þ | 📲 data_in[0:                  | [[[00100 | [[[000100010     | [[[00 1000 1000 1000 | 0,00000101010   | [[[00110011001100  | 11,01110000101     |     |
| Þ | 🍓 control_ir                  | [[[110,1 | [[[110,110,110], | 110,110,110],[110,   | 10,110]],[[110, | [[[111,110,110],[1 | 0,110,110],[110,   |     |
| Þ | 🍓 control_o                   | [[[100,1 |                  |                      | [[[100,100,     | 00],[100,100,100], | 100,100,100]],[[10 | ,10 |
|   |                               |          |                  |                      |                 |                    |                    |     |

# Fig. 11 Simulation result of NoC router for UP port routing operation.

Case 3:

In this case, the data with destination address related to the Local port is given to a router which routes the data to east port. The simulation snapshot of this scenario is shown in Fig.12.



# Fig. 12 Simulation result of NoC router for Local port routing operation

Case 4:

In this case, the data with destination address related to the Down port is given to a router which routes the data to East port. The simulation snapshot of this scenario is shown in Fig.13.

|   | 8.221         |          |                  |                     | 1           | 95 225 no    |         |           |            |    |         |        |        |        |       |       |
|---|---------------|----------|------------------|---------------------|-------------|--------------|---------|-----------|------------|----|---------|--------|--------|--------|-------|-------|
|   |               |          |                  |                     | 105/525 118 |              |         |           |            |    |         |        |        |        |       |       |
| ١ | Vame          | Value    |                  | 180 ns              | 18          | 15 ns        |         | 190 ns    |            | 1  | )5 ns   |        | 1      | 200 ne | 3     |       |
|   | lig dk        | 1        |                  |                     | l           |              |         |           |            |    |         |        |        |        |       |       |
|   | lla rst       | 0        |                  |                     |             |              |         |           |            |    |         |        |        |        |       |       |
| ) | 🖌 🔣 data_in(( | [[[00110 | [[[001100110011  | 0011,010111110 )    | [           | [00110011    | 001100  | 11,01011  | 111010     | ĺ  | [[00110 | 01100  | )1100  | 11,01  | 0111  | 11010 |
| ) | 🖌 🍇 control_  | [[[101,1 | [[[101,110,110], | 110,110,110],[110,1 | 1           | ),110]],[[11 | 10,110, | 110],[110 | ,110,1     | [  | [101,1  | 10,110 | )],[11 | 1,110  | ,110] | ,[111 |
| ) | 🖌 👹 control_  | [[[100,0 | [[[100,00        | 0,000],[100,000,000 | j,          | [100,100,1   | 00]],[[ | 100,010,0 | 00],[000,1 | 00 | ,100],[ | 010,0  | 10,00  | 0]],[[ | 100,0 | 10,10 |
|   |               |          |                  |                     |             |              |         |           |            |    |         |        |        |        |       |       |

Fig. 13 Simulation result of NoC router for Down port routing operation

Case 5:

In this case, the data with destination address related to the West port is given to a router which routes the data to East port. The simulation snapshot of this scenario is shown in Fig.14.



Fig. 14 Simulation result of NoC router for West port routing operation

# Case 6:

In this case, the data with destination address related to the North port is given to a router which routes the data to East port. The simulation snapshot of this scenario is shown in Fig.15.

# Fig. 15 Simulation result of NoC router for North port routing operation



Case 7:

In this case, the data with destination address related to the South port is given to a router which routes the data to East port. The simulation snapshot of this scenario is shown in Fig.16.

|                    |          |                     |                   | 259.575 ns                                              |        |
|--------------------|----------|---------------------|-------------------|---------------------------------------------------------|--------|
| Name               | Value    | 250 ns              | 255 ns            | 260 ns 265 ns 270 ns                                    | 275 n  |
| $\mathbb{U}_0$ clk | 1        |                     |                   |                                                         |        |
| U <sub>o</sub> rst | 0        |                     |                   |                                                         |        |
| 🕨 📲 data_in[0      | [[[00110 | [[[00110011001      | [[[001100110011   | 011,01011111010                                         | [[[00  |
| 🕨 📲 control_ir     | [[[101,1 |                     |                   | [[[101,110,110],[110,110,110],[110,110,110]],[[110,1    | 10,110 |
| 🕨 👹 control_o      | [[[100,0 | [[[100,000,000],[10 | 0,000,000],[000,1 | d,000]],[[100,010,000],[000,100,000],[010,010,000]],[[1 | [[[10  |

Fig. 16 Simulation result of NoC Router for South port routing operation

# IV. COMPARISON WITH EXISTING TECHNIQUES

The comparison result of the proposed NoC router is shown in the Table.2 with existing 3D NoC router architecture presented by Raaed and Riyam [16] and Adesh Kumar et al. [17]. The NoC architecture presented by Raaed and Riyam [16] is implemented on Spartan-3 FPGA which required 1861 Slice registers and operated on 82.734 MHz frequency.

The complex architecture of the router uses the Round-Robin algorithm which uses high hardware utilization and operates at lower operating frequency. The author Adesh Kumar et al. [17] has presented multilayer NoC architecture which is implemented on Virtex-5 FPGA. Here, this architecture requires 180 Slice registers and 315 LUT-FF pairs. The mesh architecture with real time conditions such as deadlock and livelock increases with the increase in the hardware requirements.

| Parameters                    | Raaed and<br>Riyam [16] | Adesh<br>Kumar et<br>al. [17] | Proposed |  |  |
|-------------------------------|-------------------------|-------------------------------|----------|--|--|
| FPGA                          | Spartan-3               | Virtex-5                      | Zynq-7   |  |  |
| Slice Registers               | 1861                    | 180                           | 155      |  |  |
| Slice LUTs                    |                         |                               | 954      |  |  |
| LUT-FF Pairs                  |                         | 315                           | 155      |  |  |
| Maximum<br>Frequency<br>(MHz) | 82.734                  |                               | 190.752  |  |  |

Table. 2 Hardware comparisons for different parameters

# **V. CONCLUSION**

In this paper, an efficient 3D NoC architecture is proposed which is implemented on Digilent Zybo Z7-10 FPGA board with the help of Xilinx ISE tool. The proposed research requires different materials and devices to test the functionality and hardware optimization parameters. The Xilinx Zybo Z7-10 FPGA board is manufactured by Xilinx which is required for implementation with JTAG cable to download the bitfile on the board. Finally, the architecture is again tested using Xilinx System Generator CAD tool to validate the results which are obtained. In this process, we require different materials from sophisticated CAD tools for Hardware testing.

The Control unit present in the input buffer is implemented using FSM model to optimize the buffer architecture in turn the router architecture. This also increase the design simplicity by adding or removing any node. The priority concepts implemented through priority encoder helps in processing simultaneous data. Finally, the proposed 3D router is compared with existing 3D router designs to validate the results.

### REFERENCES

- P. P. Pande, C. Grecu, "Performance evaluation and design tradeoffs for network-on-chip interconnect architectures," Computers, IEEE Transactions on, vol. 54, no. 8, pp. 1025–1040, 2005.
- [2] A. Salaheldin, K. Abdallah, N. Gamal, and H. Mostafa, "Review of noc-based fpgas architectures," Energy Aware Computing Systems & Applications (ICEAC), IEEE International Conference, 2015, pp. 1–4.
- [3] S. Y. Jiang, Y. Liu et al., "Study of fault-tolerant routing algorithm of noc based on 2d-mesh topology," Applied Superconductivity and Electromagnetic Devices (ASEMD), IEEE International Conference, 2013, pp. 189–193.
- [4] V. A. Palaniveloo, J. A. Ambrose, and A. Sowmya, "Improving gabased noc mapping algorithms using a formal model," VLSI (ISVLSI), IEEE Computer Society Annual Symposium, 2014, pp. 344–349.
- [5] D. Borrione, A. Helmy et al., "A generic model for formally verifying noc communication architectures: A case study," First International Symposium on IEEE Networks-on-Chip, 2007, pp. 127–136.
- [6] D. Ghosh, P. Ghosal, and S. P. Mohanty, "A highly parameterizable simulator for performance analysis of NoC architectures," International Conference on Information Technology (ICIT), 2014, 2014, pp. 311–315.
- [7] N. Genko, D. Atienza, and G. De Micheli, "Noc emulation on fpga:Hw/sw synergy for noc features exploration," in Proceedings of the International Conference on Parallel Computing (ParCo 2005), no. EPFL-CONF-91160, 2005, pp. 753–760.
- [8] S. Foroutan, Y. Thonnart et al., "Analytical computation of packet latency in a 2D-mesh NoC," in Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference NEWCAS-TAISA, Jun. 2009, pp. 1–4.
- [9] Z. Qian, Da-Cheng et al., "SVR-NoC: a performance analysis tool for network-on-chips using learning-based support vector regression model," in Design, Automation Test in Europe Conference Exhibition, Mar. 2013, pp. 354–357.
- [10] H. Hossain, M. Ahmed et al., "Gpnocsim a general purpose simulator for network-on-chip," in International Conference on Information and Communication Technology ICICT '07, 2007, pp. 254–257.
- [11] J. Hestness, B. Grot, and S. W. Keckler, "Netrace: Dependency-driven trace-based network-on-chip simulation," in Proc. of the Third International Conference on Network on Chip Architectures, series NoCArc '10. New York, NY, USA: ACM, 2010, p. 3136.
- [12] Omprakash Ghorse, Nitin Meena and Shweta Singh, "Review on Different Types of Router Architecture and Flow Control," International Journal of Engineering Trends and Technology, pp. 4609-4613, Vol. 4, Issue. 10, October 2013.
- [13] Neila Moussa, Farah Nasri and Rached Tourki, "NoC Architecture Comparison With Network Simulator NS2," International Journal of Engineering Trends and Technology, pp. 340-346, Vol. 2, No. 13, July 2014.
- [14] Charls Roth (Jr.), "Fundamentals of Logic Design", Cnengage Learning, 1975.
- [15] Santrupti M. Sobarad, SayantamSarkar and ShubhangiLagali, "FPGA Implementation of High Speed and Low Area Four Port Network-On-

Chip (NoC) Router", IOSR Journal of VLSI and Signal Processing,

- (NoC) Router, 105K Johnar of VLSI and Signal Processing, Vol. 6, pp. 52-57, 2016.
  [16] Raaed Faleh Hassan and RiyamLaythKhaleel, "Hardware Implementation of NoC based MPSoC Prototype using FPGA", International Journal of Applied Engineering Research, Vol. 13, No. 7, pp. 5443-5451, 2018.[17] Adesh Kumar, Gaurav Verma, Mukul Kumar Gupta, Mohammad
- Salauddin, B. KhaleeluRehman and Deepak Kumar, "3D Multilayer Mesh NoC Communication and FPGA Synthesis", International Journal of Wireless Personal Communications, Springer, Vol. 106, pp. 1855-1873, 2018.