# Implementation of FPGA hardware architecture for optimized SPIHT coder using 2DWT-A

C.THIRUMARAI SELVI<sup>A</sup>, R.S. SANKARA SUBRAMANIAN<sup>B</sup>, J. AMUDHA<sup>C</sup>, M. MUTHUKRISHNAN<sup>D</sup>

<sup>a</sup>Professor, Department of ECE, Sri Krishna College of Engineering and Technology, Coimbatore, Tamil Nadu, India

<sup>b</sup>Professor, PSG Institute of Technology and Applied Research, Coimbatore, Tamil Nadu, India

<sup>c</sup>Associate Professor, Department of EEE, Dr. Mahalingam College of Engineering and Technology, Pollachi, Tamil Nadu, India

<sup>d</sup>Professor, Department of Mechanical Engineering, Kalaignar Karunanidhi Institute of Technology, Coimbatore, Tamil Nadu, India

#### ABSTRACT

During multimedia communication and in Health care IOT, we transmit medical images and other information to health care experts, caretakers and relatives with faster speed. Remote monitoring gains periodical data and can be transported to the cloud due to its massive nature. This huge data can accommodate larger bandwidth and reduces the speed of operation. Hence, image compression is crucial for multimedia data communication. The proposed work develops an integrated architecture for both image transform and SPIHT based encoding process. We have selected the SPIHT algorithm for image compression because of its several expedient features. Parallel pipelined poly-phase two dimensional discrete wavelet transform based image transform architecture has been designed and implemented with reduced hardware and latency. The SPIHT encoder receives the low frequency components from the 2DWT-A image transformer. We removed other high frequency components to reduce the number of coefficients for transmission. In SPIHT encoder architecture, the hardware efficient binary arithmetic encoder with position control unit is used to avoid prefix zeros during arithmetic computation which replaces the hardware expensive arithmetic coders. This increases the speed of operation of the image codec. This brings effective hardware utilization into focus. Xilinx FPGA device with Microblaze core processor and SPARTAN 3EDK kit is used for realization of the hardware architecture. We use image codec architecture enactment principle for examination. We compare the performance of the proposed architecture with the existing schemes. The requirements, power and delay for the proposed method are optimized considerably when compared to the existing techniques. It reduces the area of the designed system while increasing the frequency by a factor of 2.1MHz. We use 180nm CMOS technology in Cadence tool for estimating the performance at cell level.

Key words: 2DWT-A, FPGA, Image Coder, Multimedia Communication, SPIHT, VLSI

## **1. INTRODUCTION**

With rapid growth in technologies like satellite communication, high definition television technologies, multimedia and internet teleconferencing, the need for digital image compression has increased [1]. Enormous amount of storage space is essential for storing and transmitting multimedia data. In order to reduce the transmission time and storage space, it is essential to compress this information. We can reduce the redundant and irrelevant information in images for compressing the image using image compression algorithms. Several image compression algorithms commonly use arithmetic coding schemes because of its ability to use fractional bits for generation of codes to obtain the optimal performance [2]. When compared to image compression techniques like JPEG, vector quantization and vector quantization with wavelets, Set Partitioning in Hierarchical Trees (SPIHT) coding provides superior performance in terms of peak signal-to-noise ratio (PSNR), performance, speed and various other factors [3]. Scalable bit stream may be generated continuously for encoding still images using the wavelet based SPIHT algorithm. It does not compromise the compression;

however, images are created with required quality and many bit rates. On reaching the required image quality or target rate, the decoder can stop the decoding process.

The Embedded Zero Wavelet (EZW) scheme is used for developing the SPIHT algorithm. Distortion scalability is a significant feature of SPIHT coding [4]. During progressive transmission, it is possible to embed the distortion scalability feature. Progressive transmission involves transmission of the most significant information first, followed by the less important data. When compared to the EZW encoder, the bit rate can be compressed further without the use of an entropy encoder in SPIHT [5]. We can stop encoding or decoding process at any operational stage using SPIHT encoder. We describe the various parameters involved in image compression using SPIHT encoder in Table 1.

| Parameters                  | Quantity                                               |
|-----------------------------|--------------------------------------------------------|
| Memory requirements         | Moderate                                               |
| Computational load          | Low                                                    |
| Complexity                  | Moderate                                               |
| Compression ratio           | High                                                   |
| Power consumption           | Low                                                    |
| Reconstructed image quality | High                                                   |
| Processing Speed            | High                                                   |
| Advantages                  | Faster encoding time                                   |
|                             | It permits encoding and decoding process at all stages |
| Disadvantages               | Intricacy due to memory and sorting procedures         |

 Table 1: SPIHT Encoder Parameters

Wavelet transform is used for decomposition of the image into several hierarchical images on implementation of the SPIHT algorithm. Then, spatial orientation trees consisting of sets of transformed sub-band coefficients are formed [6]. The correlation between the sub-bands is highly exploited by this feature. Encoding the coefficient sets are performed in a bit plane by bit plane order. Encoding is performed from the highest to lowest magnitude. On computing the wavelet transform of the image, the wavelet is partitioned into spatial oriented trees by SPIHT [7]. Multiple nodes are used for arrangement of the trees and each node corresponds to each pixel. The significance of every individual pixel is tested on comparison with a threshold value. A pixel and its off-springs are considered to be insignificant and are not transmitted if they are below the threshold value. After every pass, the threshold value is reduced by 50%.

Methodological transmission of wavelet coefficients takes place using SPIHT coder as represented in the simplified SPIHT image encoder shown in Fig.1. Similar function is performed by the basic SPIHT image codec shown in Fig.2. It transmits the coefficients in a bit by bit plane after the coder sorts them in a decreasing order. Refinement and sorting pass and aids the sorting process. During the sorting pass, it performs the reading or scanning of the coefficients in a predetermined order at the first pass [8]. In the current plane, the most significant bit is represented by '1'. This step is termed as the significant pass. The position information of the significant coefficients and significance mapping are transmitted implicitly to the decoder. The real bit-plane for the significant bits identified previously are transmitted by the refinement pass. Based on the wavelet coefficients magnitude, the significant coefficients are recognized by SPIHT and the sign is encoded into the bit stream.



Figure 1. Simplified SPIHT image encoder block diagram



Figure 2. Basic SPIHT image codec block diagram

The arithmetic coder can also perform compression in the SPIHT coder [9]. When the number of levels increases, the rate of compression also increases. However, if the number of levels is over 5, the compression is negligible. The image dimension directly influences the number of levels. The image is further decomposed into equal rows and columns. For a 5-level decomposition model, we can see slight variations when arithmetic coding is employed.

Ravi et al. developed an algorithm using 2DWT-A technique for computation of multi-level sub bands in [10]. Lifting scheme algorithm is used for performing DWT computation. A hardware description language (HDL) model is designed and implemented on an application specific integrated circuit (ASIC) platform for fast and secure image data transmission. Advanced encryption standard (AES) encryption is performed whose computation time depends on selection of appropriate sub bands, data recovery and compression ratio. High latency and throughput is obtained by the DWT architecture. They use 65nm CMOS technology for implementing the ASIC model. Power performance, timing and area of data compression and encryption are identified using this model. Kefalas et al. in [11] propose an image compression standard termed as a consultative committee for space data systems (CCSDS) 122.0-B-1 which comprises a high-throughput architecture. They reduce the total memory operations using a memory organization architecture that eliminates the need for external memories and further reduces the number of individual memories. 136 MSamples/sec are achieved with a sample precision of 16-bits on implementing this model on commercial space grade FPGA device.

A three level parallel set partitioning in hierarchical trees (TP-SPIHT) comprising byte-level, bit-plane level and tree level in remote sensing system for acceleration of large volumes of remote sensing images (RSI) is proposed by Chen et al. in [12]. They optimize the dynamic processing with the linked list by parallelizing SPIHT with the graphic processing and collaborative central processing unit. The data dependency of the traditional SPIHT is removed using three static marker matrices, namely bit-stream organization, tree-level parallel coding and pre-processing for basic parallel SPIHT coding. The coding is performed within 292.03ms using TP-SPIHT which is 6.27 times faster than an optimized CPU. Wavelet-based electrocardiography (ECG) data compression using SPIHT algorithm is proposed by Hsieh et al. in [13]. SPIHT algorithm is modified to offer fast lossless coding, overcoming the low storage efficiency and time-consuming nature of traditional SPIHT schemes. The quantized wavelet coefficients and their corresponding bit-plane representation are used as the base for modification. It uses two types of primitive trees encompassed in a tree data structure for representing the bit plane. The coding time is reduced by 64.35% using this algorithm when compared to the traditional SPIHT algorithm, however, the bit-rate is increased by 0.28%.

The need for external data bandwidth during video processing is reduced using embedded compression based frame memory compression (FMC) technique in mobile multimedia devices by Kim et al. in [14]. This model provides power saving benefits as well. It combines SPIHT and DWT schemes for achieving low computational complexity and high compression efficiency. A similar compression ratio is used regardless of the correlation between SPIHT algorithm and the DWT coefficients for compressing all blocks. This gap is matched by employing the SPIHT algorithm with an adaptive compression ratio based on DWT coefficients while maintaining the bit-stream size constant. The peak signal-to-noise ratio is improved by 2.23dB using this technique for video quality improvement.

The bottleneck in throughput because of voluminous memory access bandwidth in high definition (HD) video coders is overcome by the algorithm proposed by Li et al. in [15]. It uses lossless image compression algorithm for alleviating the bandwidth burden and providing line random access flexibility and supporting block for several hardware video codec architectures. They employ adaptive mode decision for utilization of the image spatial correlation for pixel-level adaptive prediction. Further, the prediction residue is described using variable length coding (VLC). Statistical redundancy is used completely, and syntax elements are controlled by the VLC model. In hard-wired video coding, around 55.2% of memory access bandwidth is reduced by this model.

A novel, efficient and fast DWT operating in an FPGA was introduced by Mulani, A.O and Mane, P.B in [16]. This work used Xilinx ISE design suite 14.2 for synthesis and MATLAB for simulation. It downloads the program on an Artix-7 FPGA with Nexys4 DDR board. The authors observed that their proposed work used 120 slices which were operating at 1102.536 Mhz frequency.

Saoungoumi-Sourpele et al. in [17] observed that the successor of JPG, JPEG2000 for image compression, is still slow because of the COder-DECoder (CODEC) complexity that it holds. This process involves a lot of energy resources, memory, computation time and CPU. The authors have proposed a novel model for decoding JPEG2000 such that it works on Android OS, implemented in compact devices. It is observed that decoding time is reduced dramatically by decreasing memory usage and CPU, leading to energetically economical and fast decoded images. Their results showed memory utilization rate of 18.56%, CPU utilization rate of 9.8% and execution time of 23.41%

Chetan, H. et al in [18] proposed an analysis of Advanced Lifting scheme architecture and DA-DWT1 architecture for Image Compression. They used signals to transmit information of video and image where data is compressed and sent through a predefined limited bandwidth. This resulted in a comparison between image quality and compression. Power optimisation, accuracy and increase in speed of operation are some of the characteristics observed in this methodology. They used VLSI CAD tools for simulation and it made analysis on power consumption, area and clock speed for both the architectures. The study showed that using digital systems, it performs more efficiently better image compression whereas using DWT, it could achieve higher speed.

A combination of compression and cryptographic techniques has been introduced to protect images, as the need for a secure transmission of data has become a crucial aspect of communication. In [19], Setyaningsih, E. et al have studied three categories of such techniques that focus on reduction in size of data, data safety with reduced complexity and compression using lossless and/or lossy methodologies. We can categorize the differentiation of cryptographic and compression methods into either of the above methods, depending on their process sequences.

In [20] Divakara, S.S. et al have illustrated dual-tree complex wavelet transform computation using a systolic array-based novel architecture. This methodology is incorporated on an FPGA and the wavelet filter coefficients undergo quantization process. It then rounds these values with a limitation set as 0.5 dB. Both row and column elements are simultaneously calculated using the pipelined architecture and parallel architecture. Xilinx and Verilog programming are used to model this work. The authors have observed that their work consumes less than 3W of power, operating at a maximum frequency of 156 MHz.

Song,Y., et al in [21] examined the flaw of the existing CS-compression-encryption schemes to determine their performance in terms of compression. We found it that though they exhibited excellent security and high efficiency, these methods still lacks a compression process. Based on these observations, Song.Y.et al proposed a novel CS-enhanced compression architecture. They perform encryption and compression process using bit-level lossless CS and entropy code such that the output holds enhanced security and quality. They accomplish a reasonable reconstruction performance using both lossy and lossless coding. To ensure high security and attack resistance, it combines the initial keys with SHA-256. The authors have showed a joint scheme that outperforms the previously existing compression-encryption schemes in terms of encryption.

Our previous works [22, 23] shows an efficient system using DAC in a 2D DWT-Distributed Arithmetic Image compression technique that is used for compressing images. Using position control unit and BEC-1, the carry select adder is modified and used to implement it in the 2D DWT. Based on its implementation; we concluded that the works in [22,23] improves the speed of performance and significant reduction in area utilization and arithmetic computation.

Image compression method comprises image transform and encoding process. Image transform converts the time domain function into frequency domain and it alone does not bring the compression process. The actual compression process mainly depends on the encoding process. Among various encoding processes, SPIHT encoder is the mostly used encoding process. SPIHT encoder includes eight numbers of arithmetic encoder in the conventional VLSI hardware architecture.

The major contribution of the proposed FPGA hardware architecture for optimized SPIHT coder using 2DWT-A is discussed as following lines,

1. This work contributes a substitution of faster binary arithmetic coder based on position control unit combined with BEC unit at the place of conventional SPIHT arithmetic coder. The novel position control unit increases the computation speed of the arithmetic encoding process, which increases the speed of conventional SPIHT coder.

2. Computation of image transform is a time-consuming process because of separable 1D DWT, which can apply to row and column transformation for an image. This required additional transpose memory, which increases the hardware structure and power consumption. A parallel DA for row-wise convolution DWT and column-wise lifting DWT was contributed to design a two-dimensional discrete wavelet transform (2DWT-A). This increases the speed of computation and reduces its hardware.

3. The implemented 2DWT-A contributes a multiplier-less, look-up table based distributed arithmetic DWT computation. The removal of multiplier increases the speed of operation instead of conventional convolution used for DWT.

4. There are separate architectures available to design DWT and SPIHT architecture. The major contribution of this work is development of a complete image compression codec structure. A faster and low hardware for image compression coder was developed with integrated 2D DWT hardware together with SPIHT encoder.

The outline of this article is arranged as follows. In Section 2, the proposed FPGA hardware architecture for optimized SPIHT coder using 2DWT-A is described in detail with various processes. Experimental results and performance evaluations are provided in Section 3, and a brief conclusion is drawn in Section 4.

# 2. METHODOLOGY

The original image is initially partitioned into many sub-bands with DWT transform (3 level). A five sorting process is used to code the bit stream till there are no more elements in LIS. Based on the wavelet coefficients, the encoder determines the maximum pixel coefficient. The threshold value can be evaluated by determining the log value of the maximum coefficient. This can also be used to sort the coefficients in refinement pass and sorting pass. In the next section, we compare pixel locations and bit streams to decode. Using a quantizer, the values are reconstructed and finally, IDWT [24] is performed.

Fig.3 shows the proposed structure of the input image where 2DWT-A based DA is used to divide the image into sub bands of three categories. A scanner that scans in zig-zag order is used to extract wavelet coefficients, which are then transferred to the SPIHT coder. This novel design comprises a position control-based rapid arithmetic coder using specific adders. This will increase the speed of computation, making it suitable to process images quickly. Hence these arithmetic coders are suitable for real-time image compression.

## **2.1SPIHT** Algorithm

In the SPIHT algorithm, list of significant pixels (LSPs), list of insignificant pixels (LIP) and list of insignificant sets (LIS) are the three lists of ordering information [25]. Fig. 3 depicts the hardware configuration of the proposed work where lower sub bands of pixels are identified.



Figure 3. Hardware structure of the overall system

The 2DWT-A encourages efficient encoding by de-correlating the image pixel. On repetitive applications of the proposed method, the image decomposes into low pass and high pass sub band in vertical and horizontal direction, iteratively. On multiple decomposition, it develops a pyramidal structure where the lower sub band frequencies are used to store the energy of the image. Using the SPIHT encoder, a simple threshold, set to a power of two, is fixed to make computation simpler. The next step involves choosing the crucial pixels in the sub trees. Based on these pixels, it save their subsequent ancestors and siblings with their position as a list as depicted in Fig.4. The image pixels are read by the data unit and are also stored in the memory. Random Access Memory (RAM) is used for saving the SPIHT lists [26].





Figure 4. Searching of Significant pixels

Fig.5 represents the internal composition and working of the designed chip. It feeds the input to the memory controller from two basic components:

- Data Units: It feeds the image pixels as input to some registers in this block.
- 2DWT-A: This algorithm enforces faster computation that can produce a sub-band of low-frequency.

The memory controller is the input for the SPIHT unit and is used to switch the low frequency band to the memory. Based on this input, a list of significant pixels is created and it represents these pixels as small sized coded bits. Table 2 shows the list of development and simulation tools. Xilinx software has been used to implement the hardware and Cadence is used for power analysis cadence measurement of the different internal working units. Table 3

shows the various algorithmic steps involved in transforming and encoding to compress an image.





Table 2: Simulation and development tools used

| Design Step          | Tool      |
|----------------------|-----------|
| Hardware description | VHDL      |
| Hardware simulation  | Model Sim |
| Hardware synthesis   | Xilinx    |
| Power analysis       | Cadence   |

Table 3: Algorithm for proposed Image compression

| Algorithm : Image compression |                                                                         |  |  |
|-------------------------------|-------------------------------------------------------------------------|--|--|
| Input:                        | Image pixels, filter coefficients                                       |  |  |
| Outpu                         | t: Compressed bit stream                                                |  |  |
| Begin                         |                                                                         |  |  |
| 1.                            | Init()                                                                  |  |  |
| 2.                            | Read Image pixels: Image pixels text ()                                 |  |  |
| 3.                            | Read filter coefficients                                                |  |  |
| 4.                            | Get high pass and low pass sub-bands                                    |  |  |
| 5.                            | Get $\leftarrow$ 2DWT-A//(parallel poly –phase                          |  |  |
|                               | Distributed arithmetic)                                                 |  |  |
| 6.                            | Low pass sub-band coefficients $\rightarrow$ SPIHT                      |  |  |
|                               | coder//(modified with leading zero detection                            |  |  |
|                               | with a novel position control unit for faster                           |  |  |
|                               | arithmetic coding // $\rightarrow$ bit stream transmission              |  |  |
| 7.                            | Reconstructed Image <text< th=""></text<>                               |  |  |
|                               | $() \leftarrow IDWT \leftarrow Decoding \leftarrow Received bit stream$ |  |  |
| 8.                            | Measure Performance parameters(Area,                                    |  |  |
|                               | execution time, PSNR)                                                   |  |  |
| End                           |                                                                         |  |  |

In SPIHT, read context and symbol examination will draw a comparison between the internal register and context label to identify the tag to which it belongs. Based on the tag, the correct label is sent to the next level where the lower and high values of the blocks are

updated. The cumulative values and symbol probability are stored in cumulative register and symbol register respectively. Fig.6. represents the finite state machine diagram of the proposed 2DWT-A based SPIHT architecture.

The arithmetic coder comprises the cumulative probability renewal and register, updation of lower and higher values of the input and symbol probability renewal and register. It produces the output code-word as a stream of code as shown in Fig.7.



Figure 6. Finite state machine diagram of the 2DWT-A based SPIHT



Figure 7. Internal component of Arithmetic Coder

## **3.** EXPERIMENTAL SETTINGS

The experimental results show the operation of the proposed work in hardware and software model. The image under consideration is a 512x512 size Lena image. Fig.8 shows the impact of bit rates on PSNR as it increases from a lower value to a higher value. An elaborate comparison of the proposed DA-DWT+SPIHT is made with SPIHT without lists, SPIHT and EZW regarding the PSNR values, and we tabulate the observed reading in Table 4. Fig.9 shows the graphical representation from which we infer the proposed method to be optimal for usage. In Table 4, PSNR loss for the existing methodologies is recorded, ranging between the values 13.5 dB and 17 dB. The proposed method offers a PSNR value of 48.922 dB for 0.55 bpp. This characteristic qualifies the proposed DA-SWT+SPIHT to produce quality images, as required by many applications. From Fig.8 we can see that as bit rate increases, there is a significant improvement in rate-distortion performance.



Figure 8. Comparison of Rate distortion using different standard coders Table 4: Comparison of PSNR values for a Lena Image (512x512) at different rates

| Bit  | EZW       | SPIHT     | SPECK     | Proposed       |
|------|-----------|-----------|-----------|----------------|
| rate |           |           |           | (DA-DWT+SPIHT) |
| 0.2  | 31.112    | 31.98     | 32.80     | 45.012         |
|      | (~16.9)   | (~16.032) | (~15.212) |                |
| 0.3  | 32.720    | 34.12     | 34.80     | 46.745         |
|      | (~15.755) | (~14.625) | (~13.945) |                |
| 0.4  | 34.060    | 35.61     | 35.95     | 47.987         |
|      | (~15.927) | (~14.377) | (~14.037) |                |

Table 5 summarizes the performance of SPIHT coder, compared to 1-Level traditional SPIHT coder based on its simulation in Xilinx software. When compared with existing SPIHT coders, the proposed method is found to consume lesser resources as recorded in the Table 5. This results in reduced area consumption (62% reduction) and increased operation speed (20% increase). This has further introduced an increase in memory units though reducing the overall complexity of the Hardware when implemented in an FPGA. The performance of Context Adaptive Binary Arithmetic Coder based SPIHT (CABAC-SPIHT) and Arithmetic Coder based SPIHT (AC-SPIHT) are analyzed and we record their operating speed.

| Table 5: Comparison of CABAC and AC in SPIHT using SPIHT coder |                      |        |          |
|----------------------------------------------------------------|----------------------|--------|----------|
| Parameters                                                     | AC in SPIHT CABAC in |        | Proposed |
|                                                                |                      | SPIHT  |          |
| Slices                                                         | 1446                 | 1215   | 711      |
| 4 input LUT                                                    | 2712                 | 1904   | 1374     |
| IOB                                                            | 56                   | 46     | 0        |
| Frequency (MHz)                                                | 56.404               | 31.395 | 81.820   |
| Delay (ns)                                                     | 42.561               | 37.873 | 12.222   |

AC: Arithmetic Coder; CABAC: Context Adaptive Binary Arithmetic Coder

Based on the architectural composition, we compared No-list SPIHT (NLS), DWT and SPIHT. We tabulate the performances of the proposed and traditional methodologies in Table 6 with respect to PSNR, memory, encoding time and decoding time for a 512X512 image. Based on the results, we have identified that the proposed image coder is less complex when compared with the other methodologies. Moreover, the amount of memory used is also relatively lesser than the traditional methodologies observed in Table 6.

| time period for unrefent codecs at 0.25 opp for 512A512 mage |          |        |        |          |
|--------------------------------------------------------------|----------|--------|--------|----------|
| Performance metrics                                          | DWT      | SPIHT  | NLS    | Proposed |
| Encoding Time (sec)                                          | 0.843    | 3.681  | 0.481  | 7.11ns   |
| Decoding Time (sec)                                          | 0.069    | 1.451  | 0.385  | 4.72 ns  |
| Memory(KB)                                                   | 2793.472 | 62.567 | 65.537 | 256      |
| PSNR(dB)                                                     | -        | 32.61  | 32.76  | 48.223   |

Table 6: Comparison of PSNR values, memory, decoding time period and Encodingtime period for different codecs at 0.25 bpp for 512X512 image

Using Hardware Description language (HDL), the architecture of Arithmetic Coder (AC) and Discrete Wavelet Transform (DWT) is synthesized and simulated. We record the simulation output readings in Table 7 and Table 8. For hardware implementation, a 16-bit pixel precision image of size 1024x1024 is taken. The operating speed of the device is 81.82 MHz, and it follows a compression system with a maximum rate of 1375 Mbps. At one instant of time, the arithmetic coder can consume simultaneously four symbols. Table 7 defines the simulated results of arithmetic coders, offering a throughput of 1200 MSPs (4MSPs x 300).

| Coders     | FPGA device | Slices | LUTs  | Clock     | Throughput  |
|------------|-------------|--------|-------|-----------|-------------|
|            |             |        |       | frequency | (MBits/Sec) |
|            |             |        |       | (MHz)     |             |
| Ritter     | XC4000      | -      | -     | 40        | 8.48        |
| Fry        | XCV2000E    | -      | -     | 75        | 800.00      |
| Huang      | APEX        |        |       |           | 36.8        |
| nualig     | EP20K1000E  | -      | -     | -         | 50.0        |
| Li         | X3S1500L    | 2366   | 3416  | -         | -           |
| Jotheswar  | XC4VLX25    | 7021   | 13356 | 35        | 280.00      |
| Corsonello | XC2V1000    | 1637   | -     | 100       | 92.96       |
| Kai        | XC2V3000    | 10317  | 14742 | 56.40     | 902.46      |
| Proposed   |             | 711    | 1374  | 81.82     | 1372.71     |

| Table 8: Performance | Analysis of | f various of | AC coders |
|----------------------|-------------|--------------|-----------|
|----------------------|-------------|--------------|-----------|

| AC coders | Clock Freq | TP (MSPs) | Symbol bit per | Technology          |
|-----------|------------|-----------|----------------|---------------------|
|           | (MHz)      |           | clock          |                     |
| Marks     | 75.00      | 64.00     | 0.80           | CMOS 5S0.35 µm      |
|           |            |           |                | IBM                 |
| Kuang     | 25.00      | 3.00      | 0.12           | 0.8um SPDM          |
| Stefo     | 32.00      | 256.00    | 8.00           | ProASIC A500K       |
|           |            |           |                | FPGA                |
| Osorio    | 40.00      | 340.00    | 4.00           | 0.18um UMC Euro     |
|           |            |           |                | practice            |
| Vanhoof   | 32.00      | 10.67     | 0.33           | 0.5u CMOS 3 layer   |
|           |            |           |                | metal               |
| Printz    | 33.33      | 16.67     | 0.50           | 8 Xilinx XC 3090    |
|           |            |           |                | FPGAs               |
| Dyer      | 211.86     | 423.72    | 2.00           | 0.8um TMSC          |
| Kai       | 71.05      | 284.2     | 4.00           | Xilinx XC2V3000     |
|           |            |           |                | FPGA                |
| Proposed  | 300        | 1200      | 4.00           | Xilinx XC 3090 FPGA |

Table 9 defines the area and power consumed by SPIHT and 2DWT-A are analysed and we identify it that the SPIHT based on 2DWT-A outperforms in both criteria.

| Table 9: Area and Power Analysis of SPIHT coder and 2DWT-A coder |              |             |  |
|------------------------------------------------------------------|--------------|-------------|--|
| Features                                                         | 2DWT-A coder | SPIHT coder |  |
| Area(mm <sup>2</sup> )                                           | 0.5231       | 0.3498      |  |
| Power(mW)                                                        | 24.725       | 7.7903      |  |

We used a dedicated hardware for real time implementation of the work. Xilinx FPGA device with Microblaze core processor and SPARTAN 3EDK kit is used for realization of the hardware architecture as represented in figure 9. The storage size used for storing temporary variables is reduced and lower computational modules are used by the optimizing architecture. Architecture explorations to reduce the area and power consumption are a

concern for portable devices operated using batteries. The cost of the appliance can be cheaper by selecting low silicon area. Low power consumption causes an increase in the life of battery, which reduces the overall size and weight of the battery.



Figure 9. Hardware implementation of FPGA using embedded Microblaze core processor



Figure 10.a) Screenshot of the downloaded image, b) Screenshot of one level DWT input image from the FPGA board

BEC-1 is used along with a carry select adder and position control for enhancing the arithmetic coder structure. When compared to the carry look-ahead adder hardware circuit, the speed of operation is high using the carry select adder. It ensures fast computation using the arithmetic coder circuit through the leading zero detector and position control unit. Figure 10 represents the screenshots of the input image that is downloaded and from the one level DWT obtained through the FPGA board.

## 4. CONCLUSION

In this proposed work, we have developed an efficient VLSI architecture for image compression. Image compression architecture comprises two fundamental blocks, DWT image transform block and SPIHT encoding block. The existing 2D DWT computation architectures find maximum hardware in-terms of transform memory has been replaced by poly-phase distributed arithmetic based lifting DWT architecture. The computation expensive SPIHT encoder hardware architecture has been modified by the leading zero detection and position control unit. This modification leads to less hardware, increased speed and can be compatible with portable battery operated wireless wearable's health care devices. Limited

hardware architectures are available for integrated image transform architecture with SPIHT encoding architecture. The Verilog codes for designing both images transform and SPIHT encoder was developed in the Xilinx platform. After the simulation and synthesis of the integrated architecture in the Xilinx platform, a FPGA Kit operating on Xilinx Spartan Virtex family is used to downloading and implements the proposed image compression algorithm. To observe and identify the efficiency of the proposed work, we execute the same technique using TSMC 180nm CMOS technology in Cadence software. For hardware implementation, we take a bit pixel precision image of size 1024x1024. Based on the observations, we conclude it that the proposed architecture achieves high speed, low power and best compression rate when compared to other methodologies. The operating speed of the hardware device is 81.82 MHz with a compression rate of 375 Mbps. The novel arithmetic coder used for SPIHT coding can consume four symbols at a time and offers a throughput of 1200 MSPs.

#### REFERENCES

- [1] Hussain, A. J., Al-Fayadh, A., & Radi, N. (2018). Image compression techniques: A survey in lossless and lossy algorithms. Neurocomputing, 300, 44-69.
- [2] Setyaningsih, E. and Harjoko, A., 2017. Survey of hybrid image compression techniques. International Journal of Electrical and Computer Engineering, 7(4), p.2206.DOI: http://doi.org/10.11591/ijece.v7i4.pp2206-2214
- [3] Miya, J., & Ansari, M. A. (2020). Medical images performance analysis and observations with SPIHT and wavelet techniques. Journal of Information and Optimization Sciences, 41(1), 273-282.
- [4] Dudhagara, C. R., & Patel, M. M. (2017). A Comparative Study and Analysis of EZW and SPIHT methods for Wavelet based Image Compression. Oriental Journal of Computer Science and Technology, 10(3), 669-673.
- [5] Rani, S. S., Rao, G. S., & Rao, B. P. (2018, May). EZW, SPIHT and WDR Methods for CT Scan and X-ray Images Compression Applications. In International Conference on ISMAC in Computational Vision and Bio-Engineering (pp. 1517-1525). Springer, Cham.
- [6] Lone, M. R., & NAJEEB-UD-DIN, H. A. K. I. M. (2019). A novel hardware-efficient spatial orientation tree-based image compression algorithm and its field programmable gate array implementation. Turkish Journal of Electrical Engineering & Computer Sciences, 27(5), 3823-3836.
- [7] Boujelbene, R., Jemaa, Y. B., &Zribi, M. (2017, July). An efficient codec for image compression based on spline wavelet transform and improved SPIHT algorithm. International Conference on High Performance Computing & Simulation (HPCS) (pp. 819-825). IEEE.
- [8] Rong, X., Nie, H., Wang, W., Lin, C., & Yu, X. (2018). SPIHT-Based Image Compression Using Optimization of LIS and LIP Encoding. JCP, 13(12), 1385-1394.
- [9] Lahdir, M., Hamiche, H., Kassim, S., Tahanout, M., Kemih, K., &Addouche, S. A. (2019). A novel robust compression-encryption of images based on SPIHT coding and fractional-order discrete-time chaotic system. Optics & Laser Technology, 109, 534-546.
- [10] Ravi, R. V., Subramaniam, K., & Venkatesan, G. D. P. (2020). High-Speed Modified DA Architecture for DWT Computation in Secure Image Encoding. In

Advances in Electrical and Computer Technologies (pp. 1057-1068). Springer, Singapore.

- [11] Kefalas, N., & Theodoridis, G. (2017, September). High-throughput FPGA implementation of the CCSDS 122.0-B-1 compression standard. In 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS) (pp. 1-8). IEEE.
- [12] Chen, H., Wei, A., & Zhang, Y. (2017). Three-level parallel-set partitioning in hierarchical trees coding based on the collaborative CPU and GPU for remote sensing images compression. Journal of Applied Remote Sensing, 11(4), 045015.
- [13] Hsieh, J. H., Lee, R. C., Hung, K. C., & Shih, M. J. (2018). Rapid and codingefficient SPIHT algorithm for wavelet-based ECG data compression. Integration, 60, 248-256.
- [14] Kim, H., No, A., & Lee, H. J. (2018). SPIHT algorithm with adaptive selection of compression ratio depending on DWT Coefficients. IEEE Transactions on Multimedia, 20(12), 3200-3211.
- [15] Li, S., Yin, H., Fang, X., & Lu, H. (2017). Lossless image compression algorithm and hardware architecture for bandwidth reduction of external memory. IET Image Processing, 11(6), 379-388.
- [16] Mulani, A. O., & Mane, P. B. (2017). Fast and Efficient VLSI Implementation of DWT for Image Compression. International Journal for Research in Applied Science & Engineering Technology, V(IX), 1397–1402.
- [17] Saoungoumi-Sourpele, R., Nlong, J. M., Kamdjoug, J. R. K., & Yufui, G. V. (2020). Improve Image Decoding in Lightweight Environment Using a Coroutines Based Approach. Journal of Computer and Communications, 8(10), 60-74.
- [18] Chetan, H., & Indumathi, G. (2017). Performance Analysis of Modified Architecture of DA-DWT and Lifting based Scheme DWT for Image Compression. International Journal of Multimedia and Ubiquitous Engineering, 12(7), 31-42.
- [19] Setyaningsih, E., & Wardoyo, R. (2017). Review of image compression and encryption techniques. International journal of advanced computer science and applications, 8(2), 83-94.
- [20] Divakara, S. S., Patilkulkarni, S., & Raj, C. P. (2017). High speed modular systolic array-based DTCWT with parallel processing architecture for 2D image transformation on FPGA. International Journal of Wavelets, Multiresolution and Information Processing, 15(05), 1750047.
- [21] Song, Y., Zhu, Z., Zhang, W., Guo, L., Yang, X., & Yu, H. (2019). Joint image compression–encryption scheme using entropy coding and compressive sensing. Nonlinear Dynamics, 95(3), 2235-2261.
- [22] C. Thirumarai Selvi and R. Sudhakar, (2016). An Efficient 2D DWT-A Distributed Arithmetic with Rapid Arithmetic Coder for Medical Image Compression. Asian Journal of Information Technology, 15: 2371-2382.
- [23] C. Thirumarai Selvi and R. Sudhakar, (2013). An efficient 2DWT-A architecture using distributed arithmetic algorithm. International Review on Computers and Software,8(8)(pp 1878-1888).

- [24] Swami, S. S., & Mulani, A. O. (2017, August). An efficient FPGA implementation of discrete wavelet transform for image compression. International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS) (pp. 3385-3389). IEEE.
- [25] Çeklı, S., & Akman, A. (2017, May). An efficient SPIHT algorithm and system architecture for image compression. 25<sup>th</sup> Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
- [26] Sri, A., & Sahu, S. S. (2019, July). Improved fractal-SPIHT hybrid image compression algorithm. 10<sup>th</sup> International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-4). IEEE.