4.5 Article

A reduced memory bandwidth and high throughput HDTV motion compensation decoder for H.264/AVC High 4:2:2 profile

Journal

JOURNAL OF REAL-TIME IMAGE PROCESSING
Volume 8, Issue 1, Pages 127-140

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s11554-011-0216-7

Keywords

Real-time video coding; H.264/AVC 4:2:2 High profile; Enhanced video quality; Motion compensation; Hardware design

Ask authors/readers for more resources

This article presents the HP422-MoCHA: optimized Motion Compensation hardware architecture for the High 4:2:2 profile of H.264/AVC video coding standard. The proposed design focuses on real-time decoding for HDTV 1080p (1,920 x 1,080 pixels) at 30 fps. It supports multiple sample bit-width (8, 9, or 10 bits) and multiple chroma sub-sampling formats (4:0:0, 4:2:0, and 4:2:2) to provide enhanced video quality experience. The architecture includes an optimized sample interpolator that processes luma and chroma samples in two parallel datapaths and features quarter sample accuracy, bi-prediction and weighted prediction. HP422-MoCHA also includes a hardwired Motion Vector Predictor, supporting temporal and spatial direct predictions. A novel memory hierarchy implemented as a 3-D Cache reduces the frame memory access, providing, on average, 62% of bandwidth and 80% of clock cycles reduction. The design was implemented in a Xilinx Virtex-II PRO FPGA, and also in an ASIC with a TSMC 0.18 mu m standard cells technology. The ASIC implementation occupies 102 K equivalent gates and 56.5 KB of on-chip SRAM in a 3.8 x 3.4 mm(2) area. It presents a power consumption of 130 mW. Both implementations reach a maximum operation frequency of similar to 100 MHz, being able to motion compensate 37 bi-predictive frames or 69 predictive fps. The minimum required frequency to ensure the real-time decoding for HD1080p at 30 fps is 82 MHz. Since HP422-MoCHA is the first Motion Compensation architecture for the High 4:2:2 profile found in the literature, a Main profile MoCHA was used for comparison purposes, showing the highest throughput among all presented works. However, the HP422-MoCHA architecture also reaches the highest throughput when compared with the other published Main profile MC solutions, even considering the significantly higher complexity of the High 4:2:2 profile.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Hardware & Architecture

Power-Quality Configurable Hardware Design for AV1 Directional Intraframe Prediction

Luiz Neto, Marcel Correa, Daniel Palomino, Luciano Agostini, Guilherme Correa

Summary: This article presents a configurable intraframe prediction architecture for AV1 video coding format, aiming to balance power consumption and quality.

IEEE DESIGN & TEST (2022)

Article Engineering, Electrical & Electronic

FastInter360: A Fast Inter Mode Decision for HEVC 360 Video Coding

Iago Storch, Luciano Agostini, Bruno Zatt, Sergio Bampi, Daniel Palomino

Summary: This paper presents FastInter360, a fast inter mode decision algorithm for accelerating the encoding of ERP 360 videos. The algorithm exploits the specific behavior of the encoder when encoding 360 videos, which is due to texture distortions resulting from projection. FastInter360 comprises three algorithms that reduce encoding complexity by performing early decision, reducing motion estimation search range, and adjusting motion estimation precision. These algorithms behave differently based on distortion intensity, achieving a significant reduction in complexity without sacrificing coding efficiency.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Article Engineering, Electrical & Electronic

Energy-Efficient VLSI Squarer Unit with Optimized Radix-2m Multiplication Logic

Morgana M. A. da Rosa, Eduardo A. C. da Costa, Leandro Giacomini Rocha, Guilherme Paim, Sergio Bampi

Summary: This paper presents a new radix-2(m) squarer unit that is in demand for a variety of applications, showing higher energy savings compared to other units.

CIRCUITS SYSTEMS AND SIGNAL PROCESSING (2023)

Article Engineering, Electrical & Electronic

ReAdapt: A Reconfigurable Datapath for Runtime Energy-Quality Scalable Adaptive Filters

Pedro Taua Lopes Pereira, Guilherme Paim, Eduardo Antonio Cesar da Costa, Sergio Jose Melo de Almeida, Sergio Bampi

Summary: This paper proposes a reconfigurable datapath architecture, ReAdapt, for scaling the energy-quality trade-off of adaptive filtering at runtime. The architecture dynamically selects different levels of filter algorithms complexity and achieves a compact hardware implementation by reusing common modules. Experimental results demonstrate a balanced trade-off between energy and quality, and the dynamic reconfiguration at runtime outperforms the conventional static mode for different signal-to-noise ratio levels.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS (2023)

Article Engineering, Electrical & Electronic

Robustness Analysis of 3-2 Adder Compressor Designed in 7-nm FinFET Technology

Gerson Andrade, Matheus Silva, Cinthia Schneider, Guilherme Paim, Sergio Bampi, Eduardo Costa, Alexandra Zimpeck

Summary: This brief examines the robustness of the 3-2 AC against process, voltage, and temperature (PVT) variations in a predictive ASAP7 7nm FinFET technology. The impact of these variations on delay, power, and product-delay-power (PDP) of the 3-2 AC in super- and near-threshold voltage operating regimes is evaluated. The results show that process variation is the main concern, with near-threshold operation having a more severe level of variability.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS (2023)

Article Computer Science, Hardware & Architecture

AxPPA: Approximate Parallel Prefix Adders

Morgana Macedo Azevedo da Rosa, Guilherme Paim, Patricia Ucker Leleu da Costa, Eduardo Antonio Cesar da Costa, Rafael Soares, Sergio Bampi

Summary: Addition units are widely used in error-tolerant applications and serve as building blocks for various math operations. Parallel prefix adders (PPAs) are among the fastest adders due to their optimization of carry generation and propagation. This research introduces approximate PPAs and compares them with energy-efficient approximate adders, showing improved energy-quality and area-quality results.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2023)

Article Computer Science, Hardware & Architecture

A High-Throughput Hardware Design for the AV1 Decoder Intraprediction

Jones William Goebel, Luciano Volcan Agostini, Bruno Zatt, Marcelo Schiavon Porto

Summary: AV1 is a royalty-free and open-source video codec released in 2018, aiming to process UHD 8K and 4K videos with high coding efficiency. This article presents the hardware design of AVID, a dedicated decoder for AV1 intraprediction, which achieves a decoding rate of 120 frames/s for UHD 4K videos.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS (2023)

Article Engineering, Electrical & Electronic

Compact CMOS-Compatible Majority Gate Using Body Biasing in FDSOI Technology

Brunno Alves de Abreu, Albi Mema, Simon Thomann, Guilherme Paim, Paulo Flores, Sergio Bampi, Hussam Amrouch

Summary: This study develops CMOS-compatible compact majority (MAJ) and minority (MIN) logic gates using the body biasing feature in fully depleted silicon on insulator (FDSOI) technology. The proposed MAJ/MIN gates require considerably fewer transistors compared to their CMOS counterparts. Previous research on using MAJ/MIN gates for logic synthesis has been limited due to their large area requirement when implemented with conventional standard cells. In contrast, the FDSOI-based MAJ/MIN gates in this study leverage mature CMOS commercial technologies. SPICE simulations and error injection analysis demonstrate that MAJ/MIN-based circuits exhibit excellent resilience against errors, making them suitable for safety-critical applications where reliability is crucial.

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS (2023)

Proceedings Paper Computer Science, Interdisciplinary Applications

An Energy-Efficient StEFCal VLSI Design with Approximate Squarer and Divider Units

Morgana M. A. da Rosa, Patricia da Costa, Guilherme Paim, Eduardo da Costa, Rafael Soares, Sergio Bampi

Summary: Approximate computing is applied in the calibration procedure for radio astronomy called StEFCal to maximize area and energy savings. The StEFCal circuit uses various approximate arithmetic operators to achieve a trade-off between quality and efficiency. The results demonstrate that combining AxRSU with the NR divider significantly improves the Mean Square Error (MSE) and achieves substantial energy savings compared to the state-of-the-art.

2023 IEEE 14TH LATIN AMERICA SYMPOSIUM ON CIRCUITS AND SYSTEMS, LASCAS (2023)

Proceedings Paper Computer Science, Interdisciplinary Applications

Analysis of AV1 Arithmetic Decoder Design Space with a Novel Multi-Boolean Approach

Jiovana Sousa Gomes, Tulio Pereira Bitencourt, Sergio Bampi, Fabio Luis Livi Ramos

Summary: Video processing is necessary in today's society due to the wide consumption of video content. Video coding formats or standards are used to handle the large amount of data generated. The AV1 format is a recent alternative that efficiently encodes video and aims to be royalty-free. This paper introduces a novel Multi-Boolean Approach for the AV1 Arithmetic Decoder design, which involves processing multiple Boolean symbols in parallel to improve throughput. An analysis of different hardware architectures is conducted, and it is concluded that the best trade-off choice is to use two Boolean symbols in parallel with a multicycle AV1 arithmetic decoder circuit.

2023 IEEE 14TH LATIN AMERICA SYMPOSIUM ON CIRCUITS AND SYSTEMS, LASCAS (2023)

Proceedings Paper Computer Science, Theory & Methods

AppGNN: Approximation-Aware Functional Reverse Engineering using Graph Neural Networks

Tim Buecher, Lilas Alrahis, Guilherme Paim, Sergio Bampi, Ozgur Sinanoglu, Hussam Amrouch

Summary: The globalization of the Integrated Circuit (IC) market has attracted more partners and extended the supply chain, leading to security concerns, particularly in terms of Reverse Engineering (RE). Applying Approximate Computing (AxC) principles to circuits improves their resistance against RE, even for the powerful Graph Neural Networks (GNNs) used in functional RE. To address the challenges of AxC in RE, the promising AppGNN platform enables accurate classifications and reverse engineering of circuit functionality. The framework achieves this through a novel graph-based node sampling approach, leading to improved classification accuracy.

2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD (2022)

Proceedings Paper Computer Science, Hardware & Architecture

Direction-Based Fast Mode Decision and Hardware Design for the AV1 Intra Prediction

Marcel Correa, Daniel Palomino, Guilherme Correa, Luciano Agostini

Summary: This study presents a fast decision algorithm and hardware design for AV1 intra prediction, aiming to reduce prediction time and achieve low-power hardware design. The experiments showed that implementing the proposed algorithm in the AV1 reference encoder resulted in an average reduction of 22.56% in encoding time.

2022 35TH SBC/SBMICRO/IEEE/ACM SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2022) (2022)

Proceedings Paper Computer Science, Interdisciplinary Applications

A Multiplier-Less Level-3 Haar Wavelet Transform Approximation Requiring Five Additions Only

Morgana M. A. da Rosa, Guilherme Paim, Henrique B. Seidel, Sergio Almeida, Eduardo A. C. da Costa, Sergio Bampi

Summary: In this study, an approximate level-3 Haar wavelet transform method is proposed for ECG signal processing. Compared to the exact transform, this method significantly reduces energy consumption and VLSI hardware area while improving R-peak detection accuracy. Our method outperforms the state-of-the-art approximate level-4 Haar wavelet transform in terms of power dissipation and hardware area.

PROCEEDINGS OF THE 2022 15TH IEEE DALLAS CIRCUITS AND SYSTEMS CONFERENCE (DCAS 2022) (2022)

Article Computer Science, Information Systems

Low-Voltage,Low-Area, nW-Power CMOS Digital-Based Biosignal Amplifier

Pedro Toledo, Paolo S. Crovetti, Hamilton D. Klimach, Francesco Musolino, Sergio Bampi

Summary: This paper presents the operational principle and silicon characterization of a power efficient ultra-low voltage and ultra-low area fully-differential, digital-based Operational Transconductance Amplifier (OTA) for microscale biosensing applications (BioDIGOTA). The measured results of 180nm CMOS prototypes show that the proposed BioDIGOTA can operate with a supply voltage as low as 400 mV, consuming only 95 nW. Due to its highly digital nature, the BioDIGOTA layout occupies only 0.022 mm(2) of total silicon area, reducing the area by 3.22 times compared to the current state of the art while maintaining reasonable system performance.

IEEE ACCESS (2022)

No Data Available