

# ACTUAL-INTERVAL MULTI-CHANNEL VISION PROCESSING BASED ON FPGA

### P.TRIVENI<sup>1</sup> BACHU SUVARCHALA DEVI<sup>2</sup>

<sup>1</sup> PG. Scholar, Department of ECE, *CHEBROLU ENGINEERING COLLEGE*, Chebrolu, Andhra Pradesh, India.

<sup>2</sup> Assistant Professor, Department of ECE, *CHEBROLU ENGINEERING COLLEGE*, Chebrolu, Andhra Pradesh, India.

## **ABSTRACT:**

In order to realize digital image sequence processing for multi-channel vision in realtime simultaneously, a hardware system with FPGA&DSP is designed. In the system, two ZBT SRAM chips are used as the input and output cache for high data transferring. A FPGA chip is responsible for the core logic controlling and multi-channel video synchronous. Digital videos are sent to the processing module by Camlink bus. Data are exchanged by EMIF and McBSP between FPGA and DSPs. EDMA is used for data transferring between SRAM in FPGA and ZBT SRAM. The QDMA is used for 2D data transferring to 1D into DSP cache. Tasks are assigned to chips by C/OS on master DSP. All this together, real-time data sampling and processing for multi-channel vision was realized.

Keywords: FPGA, QDMA, EDMA, multi channel, SRAM.

## **1. INTRODUCTION:**

Video is becoming the mainstream information carrier. Camera, tablet computer and even mobile phones can produce the video information [1]-[3]. All the different source videos require too many displays, and it's difficult to analyze all channels' video data at the same time. The implementation is aimed to settle the multi-channel video data real-time processing difficulties. Moreover. with the continuous development of hardware and video processing ability, the quantity of 4K (3840x2160)resolution screen is increasing in daily life. To maximize the

Volume XIII, Issue IV, 2021 D



efficiency of the 4K screen, multichannel video processing is demand in some specific application scenarios, such as modern surgical, patient monitoring and diagnostics in modern medicine

Most of the current multi-channel display devices are Personal Computer (PC) software-based design, but when it comes to process the ever-increasing massive data, high frame rate and high video resolution multi-source scenes those devices become powerless [8]-[10]. Therefore, FPGA-based hardware design with multi-thread and real-time processing capabilities can be a great solution to these problems and the video time delay problem [11]. Parallel video processing design is much more suitable speed for high and multi-task applications.

Nowadays, many researchers focus on the video processing with FPGA. The character of flexibility, high efficiency and low power consumption make the FPGA much more popular. Weiguo Zhou and Yunhui Liu [12] post a method to processing the camera video with **ISSN: 2057-5688** 

image mosaic algorithm based on FPGA. Peng Sun [13] presented a video fusion with Central Processing Unit (CPU) + FPGA architecture, which can efficiently accelerate the process speed. Luis Araneda and Miguel Figueroa [14] presented a hardware architecture for digital video stabilization based on FPGA.

However, the works mentioned above are for limited video channel with low resolution. To processing FHD or UHD videos, a new architecture should be designed. FPGA based design can efficiently reach the demand of real-time processing, low power consumption and stable performance. To meet the of real-time requirement video processing, the proposed design is implemented through hardware architecture. The architecture includes FPGA core processor, SiI9616 video processor, Advanced RISC Machines (ARM) coprocessor and DDR3 memory. SiI9616 video processor can decode, encode and enhance lots of video formats based on the hardware core. FPGA can implement the parallelism video

Volume XIII, Issue IV, 2021



processing algorithm and drive the DDR3 storage through hardware architecture. ARM coprocessor control the whole system by analyzing the states. Obviously, the system 4K resolution can be handled by the implementation.

The video processing contains lots of parts. From the very beginning, some processing based on FPGA basic hardware architecture needed to be settle, such as video scaling. Gamma conversion, on-screen display (OSD) and so on. All these previous works based on FPGA architecture with the capacity to process real-time FHD videos sequences at more than 60 frames per-second (60fps), which is contributed to the multichannel video merge implementation. To satisfy the demand of 4K real-time video processing, the speed of DDR3 storage also is the key of the implementation. Therefore, we have used Verilog code construct the DDR3 driver and efficiently control the video storage process. Based on the memory process, all the arithmetic can be implemented by the hardware description

language Verilog. These previous works focus on one channel video are processing. However, the expectation is to process multi-channel videos at same time and display the merged real-time video. which implementation need reconsider the video management and DDR3 SDRAM storage strategy. Four to nine channel videos will be considered into the design, which data quantity is much bigger than the previous work. All the channels' balanced, different channel data switch, timing design and the stability of display are needed to be considered seriously. With the help of the previous research work, the brand hardware architecture new was reconstructed for the high-performance multi-channel video processing.

#### **2. LITERATURE SURVEY:**

D. Shin and S. K. Gupta, "A re-design technique for datapath modules in error tolerant applications," in Proc. 17th Asian Test Symp. (ATS), 2008, pp. 431–437. As CMOS VLSI process moves into nanoscale, chip yields are decreasing [1]. However, for applications such as images, video, audio, graphics, games, and error-

# ISSN : 2057-5688

Volume XIII, Issue IV, 2021



correcting codes for wireless communication, some manufacturing defect induced errors are tolerable, provided they are of certain types and have severities within certain limits [5, 6]. Researches in our group have analyzed several of these applications, such as the motion estimation (ME) block of an MPEG encoder [5] and the discrete cosign transform (DCT) block of a JPEG encoder [6]. For these applications, many faults in the block under analysis either cause no distortion or cause a small amount of distortion in the output, namely decoded image/video. If the output images/videos produced by the chip are guaranteed to be acceptable (despite these distortions), we can consider the defective chip as acceptable. In this manner, unlike the perfect/imperfect classification carried out in 'classical' test approaches, this observation enables us to enhance yield by developing new test techniques that allow us to also use acceptable chips, i.e., chips that can only cause tolerable errors.

M. Elgamel, A. M. Shams, and M. A. Bayoumi, "A comparative analysis of low power motion estimation VLSI architectures," in Proc. IEEE Workshop Signal Process. Syst. (SiPS), Oct. 2000, pp. 149–158.

#### **ISSN : 2057-5688**

The field of approximate computing has established significant attention in image and video compressions algorithm such as MPEG and JPEG. However, existing approximate architectures typically static approximate hardware configuration is used for an MPEG encoder (i.e., a fixed level of approximation), the output quality varies greatly for different input videos. This project addresses this issue by suggesting a reconfigurable approximate arithmetic units for MPEG encoders that improves power consumption. To design the dual mode full adder(DMFA) and then implement it in reconfigurable adder/subtractor blocks (RABs), which have the capability to moderate their degree of approximation and consequently incorporate these blocks in the motion estimation and discrete cosine transform components of the MPEG encoder. In order to test Motion Estimation in a video coding system with Error Detection and data recovery Architecture(EDDR) is designed based the **Residue-and-Quotient** on (RQ)code. Residue-andquotient code is used in the error detection architecture that has 16 Processing Elements (PEs) and 16 Test Code Generation (TCG) blocks for computing the test codes for each pixel value in the macroblock this architecture is

Volume XIII, Issue IV, 2021



used which has a single PE and a TCG for computing the test codes for the pixel valuesAn error in processing elements (PEs), key components of a ME, can be detected and recovered effectively by using the proposed

S. Chong and A. Ortega, "Power efficient motion estimation using multiple imprecise metric computations," in-proc. IEEE Int. Conf.Multimedia Expo, Jul. 2007, pp. 2046–2049

Low Power is a crucial requirement for portable multimedia devices making use of various signal processing algorithms and architectures. Human beings are unable to mark slightly erroneous Outputs which most of the multimedia applications produces. Consequently, we do not require to produce precisely correct numerical outputs. In this paper, we recommend logic complexity reduction at the transistor level. We demonstrate this concept, by proposing various approximate adder due to which complexity is reduced as the number of transistor is reduced and using them to design approximate adders. When compared to existing implementations using accurate adder, simulation results specifies power saving using the proposed approximate

adders. By utilizing approximate adder, 16bit CLA is implemented consisting of four different types of basic blocks depending upon the presence of carry propagation (P), carry generation (G), sum (S) and Cout at different levels. we proposed approximate adders that effectively utilized to trade off power and quality for error resilient DSP systems. Our approach aim is to simplify the complexity of a conventional full adder cell by decreasing the number of transistors and also the load capacitances. When the errors introduced bv these approximations implementation were reflected at a high level in a typical DSP algorithm, the impact on output quality was very little. A decrease in the number of series connected transistors which help in reducing the effective switched capacitance and achieving voltage scaling. Our experimental results show that the proposed architecture results in power savings

#### **3. PROPOSED SYSTEM:**

To process higher resolution videos, it is necessary to redesign a higher performance hardware board system. As shown in Fig. 1, the proposed implementation is suitable for nine paths real-time video data processing. To

#### Volume XIII, Issue IV, 2021 Decen

December

#### 5

## **ISSN: 2057-5688**



processing multi-channel videos, previous designed video processing boards also can seamless merged two High Definition (HD) video channels, and send the merged videos to new designed higher performance hardware system. The new designed system is powerful enough to process the higher resolution videos, and construct 4K realtime videos, which is the core of the implementation.

The new video processing board storage data throughput rate is up to 25.6Gbits/s, which is faster enough to the 4K video processing. The core video processor, Xilinx FPGA XC6SLX150, contains about 147K logic cells, which is enough for the video processing. Moreover, the new system supports hardware core based video processing, such as High Definition Multimedia Interface (HDMI) signal decoding and encoding, noise reduction, video smooth and picture enhancement. ARM Cortex A9 can supports a powerful operation system to analyze the video states and manage the whole implementation. The details about the new board system can be described in

#### **ISSN : 2057-5688**

two parts: hardware implementation architecture and Hardware Description Language (HDL) implementation architecture.



Fig. 2. The processing board architecture

FPGA is the core processor in the whole system, which can drive the DDR3 memory at 25.6Gbits/s and implements the video data processing algorithm. Moreover, ARM 9 is the core controller of the whole system. So, the ARM 9 will configure the SiI9616 and communicate with the FPGA processor. The Ethernet is a useful port, which can communicate with the system and achieve online controlling. The structure of the processing board is shown in Fig. 2.

HDMI video input module can receive a group of differential signals to SiI9616 processor. The HDMI signals have a good anti-interference ability. And the HDMI interface has smaller volume than the traditional DVI digital interface. The HDMI video data is organized in standard format. Therefore, to simplify the FPGA processing

Volume XIII, Issue IV, 2021



works and make sure the HDMI videos can be decoded with low time delay, HDMI format data will be converted to parallel data. The parallel data can be converted to 12-bits true color by the SiI9616 processor. The hard process core can convert the format efficiently.

HDMI video output module is similar to the input module. FPGA video processor directly drive the SiI9616 processor by transmitting 36-bit parallel video output. Due to the requirement that the implementation support 4K resolution videos, the speed of parallel data is about 268MHZ. For a normal 1080p FHD video, the speed is only 148.5MHZ. Therefore, the timing of 4K video should pay more attention. At last, the encoded HDMI 4K resolution video will be transformed to the screen. Xilinx FPGA XC6SLX150 core processor is the core part in the whole system, which will implement the video processing algorithm, and manage multichannel video data streams.

SiI9616 video processors feature a digital processing core that performs real-time video format conversion and image improvement. The format is supported from any input format to 4K resolutions. The video processor supports multicolor space conversion, mosquito noise reduction, video smoothing, detail enhancement and so on. All of the processes are based on the hard core designed. Therefore, the delay of the videos will be extremely reduced. ARM 9 Coprocessor is the controller of the whole system. To make the implementation more convenient to control, the image analysis and top-level operating platform should be constructed. The Coprocessor will configure SiI9616 write video the processor; information to FPGA and analysis the key image and FPGA state

Ethernet ports can be used to receive control commands. So, we can control the multichannel videos display through Wireless Fidelity (WIFI). DDR3 memory will be used to store the plenty of video data. In the design, there are 4Gbits space for video data storage. The outside video data is coming into the system in an independent clock. But their speed is much lower than the DDR3 operation speed. Therefore, First Input First Outputs (FIFOs) will be introduced into the system to match the DDR3 operation speed. Also, the DDR3 memory is driven by FPGA core processor.

### **ISSN : 2057-5688**



## **4. RESULTS**

## **EXPLANATION**



## Fig.4.1. Schematic diagram.



Fig.4.2. RTL diagram.



Fig.4.3. Schematic diagram.

## **ISSN: 2057-5688**



Fig.4.4. Simulation results.

# 0010 0100 1100 0 e \$

## Fig.4.5. Output results.

Device utilization summary:

Selected Device : 6vcx75tff484-2

| Slice Logic Utilization:            |      |     |    |       |      |
|-------------------------------------|------|-----|----|-------|------|
| Number of Slice LUTs:               | 1752 | out | of | 46560 | 3%   |
| Number used as Logic:               | 1752 | out | of | 46560 | 3%   |
| Slice Logic Distribution:           |      |     |    |       |      |
| Number of LUT Flip Flop pairs used: | 1752 |     |    |       |      |
| Number with an unused Flip Flop:    | 1752 | out | of | 1752  | 100% |
| Number with an unused LUT:          | 0    | out | of | 1752  | 0%   |
| Number of fully used LUT-FF pairs:  | 0    | out | of | 1752  | 0%   |
| Number of unique control sets:      | 0    |     |    |       |      |
| IO Utilization:                     |      |     |    |       |      |
| Number of IOs:                      | 104  |     |    |       |      |
| Number of bonded IOBs:              | 100  | out | of | 240   | 41%  |

## Fig.4.6. Time delay.

Timing Details: -----All values displayed in nanoseconds (ns)

| Timing constraint | : Default path analysis                      |
|-------------------|----------------------------------------------|
| Total number of   | paths / destination ports: 206266531285 / 32 |
|                   |                                              |
| Delay:            | 16.660ns (Levels of Logic = 43)              |
| Source:           | data_in2<1> (PAD)                            |
| Destination:      | data_out<31> (PAD)                           |

Volume XIII, Issue IV, 2021 December http://ijte.uk/



Fig4.6. Delay.

| File Edit View Tools Help |                       |                       |      |              |            |              |                 |                   |              |       |               |       |
|---------------------------|-----------------------|-----------------------|------|--------------|------------|--------------|-----------------|-------------------|--------------|-------|---------------|-------|
| 🖻 🔄 🖉 🗋                   |                       |                       |      |              |            |              |                 |                   |              |       |               |       |
|                           | ×                     | 8                     | C D  | E            | £          | G            | н               | ( J               | K L          |       | N             |       |
| lev                       | Device                |                       |      | h Chip Fowar | (W) Used   | Available    | Utilization (%) | Suppl             | Surray       |       | Oynamic Quies | cert  |
| 🗄 🥵 Views                 | Fanily                | Vitex6                |      |              | 1 (123 32) |              | 1               | Source            |              |       | unit (A) Ouro |       |
| E D Project Settings      | Pat                   | ncfivor/9             |      |              | 1495 48    |              | -               | Voort             | 1.000        | 2.353 |               | 0.658 |
| Default Activity Rates    | Package               | #454                  | - 12 |              | 1.187 19   | 3 24         | 42              | Viceax<br>Vicea25 | 2.500        | 0.045 |               | 0.045 |
| E Sunnay                  | Temp Grade<br>Process | Commercial<br>Typical |      |              | 3037       |              |                 | Voce25<br>MSTAVoc | 2.500        | 0.304 |               | 0.001 |
| Confidence Level          | Frooss<br>Speed Grade | Typea                 | - 10 | 8            | 100        |              |                 | Matave            | 1,000        | 0.304 | 0.000         | 0.213 |
| 🖯 Detals                  | 121700 (2550)         | 14                    | _    | _            | -70.000    | 1700 10 1000 | Andon Terro     | L. Harris         | 1.60         | 924   | 0000          | 1213  |
| By Herarchy               | Environment           |                       |      | Themal Pocet |            | D D          | C)              |                   | -            | 745   | Vors Das      |       |
| 🗄 🦪 By Resource Type      | Arbert Terp C         | 50.0                  | _    |              | 2          |              |                 | Seeks             | Forward (UI) | 3.037 |               | 1.332 |
| Lagic                     | like outper TJA       |                       |      |              |            | 1            |                 |                   |              |       |               |       |
| ⊖ Sgrais                  | Custom TSA (C/N       | NA I                  |      |              |            |              |                 |                   |              |       |               |       |
| Data                      | Arton (LFM)           | 250                   |      |              |            |              |                 |                   |              |       |               |       |
| -08                       | Heat Selk             | Medium Profile        |      |              |            |              |                 |                   |              |       |               |       |
| Talar Source              | Custon TSA (C/)       |                       |      |              |            |              |                 |                   |              |       |               |       |
| Fatinated                 | Board Selection       | Medum (10'k10         |      |              |            |              |                 |                   |              |       |               |       |
| Defaut                    | # of Board Layer      | 8to 11                |      |              |            |              |                 |                   |              |       |               |       |
| Calculated                | Custom TJB (C/N       | 1 14                  | _    |              |            |              |                 |                   |              |       |               |       |
|                           | The Power A           | alysis is up to date  |      |              |            |              |                 |                   |              |       |               |       |

#### Fig.4.7. Power report.

### 5. CONCLUSION:

This paper introduced an implementation to merge and process the multi-channel independent videos and reconstruct 4K real-time video. This system have ability to process full-HD (1920x1080 pixels) videos and display the costumed result on 4K UHD (3840 x 2160 pixels) screen. Specifically, it's able to process the input video frame rate at 60fps meeting the demand of real-time requirement. The hardware architecture with HDL implementation can efficiently reduce the video delay and power consumption and improve the video system robustness. Compared to the traditional CPU + Graphics Processing Unit (GPU) architecture, the FPGA solution can deal the video with fast changing of

processing development by costuming the processing structure and improve the performance of multichannel videos processing and real-time display with lower cost.

#### **REFERENCES:**

[1] P. Tang, W. Jin and J. Liu, "Railway inspection oriented foreground objects detection and occlusion reasoning for locomotive-mounted camera video," 2016 35th Chinese Control Conference (CCC), Chengdu, 2016, pp. 10144-10149.

[2] C. H. Chen, T. Y. Chen, D. Y. Huang and K. W. Feng, "Front Vehicle Detection and Distance Estimation Using Single-Lens Video Camera," 2015 Third International Conference on Robot, Vision and Signal Processing (RVSP), Kaohsiung, 2015, pp. 14-17.

[3] V. Chandrasekaran, S. Dantu, P. Kadiyala, R. Dantu and S. Phithakkitnukoon,
"Socio-technical aspects of video phones,"
2010 Second International Conference on
Communication Systems and Networks
(COMSNETS 2010), Bangalore, 2010, pp. 1-7.

[4] D. R. Marković, A. M. Gavrovska and I.S. Reljin, "4K video traffic analysis using

Volume XIII, Issue IV, 2021

December

## **ISSN: 2057-5688**



seasonal autoregressive model for traffic prediction," 2016 24th Telecommunications Forum (TELFOR), Belgrade, 2016, pp. 1-4.

[5] B. C. Sunny, Ramesh R, A. Varghese and V. Vazhayil, "Map-Reduce based framework for instrument detection in largescale surgical videos," 2015 International Conference on Control Communication & Computing India (ICCC), Trivandrum, 2015, pp. 606-611.

[6] R. Peng, R. J. Sclabassi, Q. Liu, G. Justin and M. Sun, "Synthesizing Multi-View Video Frames for Coding Patient Monitoring Video," 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, New York, NY, 2006, pp. 5157-5160.

[7] M. Tahir, Z. Ul-Abdin and M. A. Qadir, "Enhancing the HEVC video analyzer for medical diagnostic videos," 2015 12th International Conference on High-capacity Optical Networks and Enabling/Emerging Technologies (HONET), Islamabad, 2015, pp. 1-5.

[8] P. P. Shete, D. M. Sarode and S. K. Bose,"Real-time panorama composition for video surveillance using GPU," 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, 2016, pp. 137-143.

[9] J. C. T. Hai, O. C. Pun and T. W. Haw, "Accelerating video and image processing design for FPGA using HDL coder and simulink," 2015 IEEE Conference on Sustainable Utilization And Development In Engineering and Technology (CSUDET), Selangor, 2015, pp. 1-5.

[10] S. Chen, S. Yu, J. Lu, G. Chen and J. He, "Design and FPGA-Based Realization of a Chaotic Secure Video Communication System," in IEEE Transactions on Circuits and Systems for Video Technology, vol. PP, no. 99, pp. 1-1.

## **ISSN : 2057-5688**

Volume XIII, Issue IV, 2021