Article
Engineering, Electrical & Electronic
Kai Lin, Chuanmin Jia, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao
Summary: In this paper, an innovative motion modeling approach is proposed by decomposing it into two components: intrinsic motion and compensatory motion. The intrinsic motion captures the implicit spatiotemporal context in the historical sequence, while the compensatory motion acts as side information for structural refinement and texture enhancement. By decomposing motion, this method addresses the questions of motion representation, compensation, and coding in the learned video compression framework.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Computer Science, Artificial Intelligence
Thuc Nguyen Huu, Vinh Van Duong, Jonghoon Yim, Byeungwoo Jeon
Summary: Plenoptic images and videos require a large amount of data storage and high transmission cost. This paper investigates motion compensation for plenoptic video coding in the ray-space domain. A novel motion compensation scheme is proposed and integrated into well-known video coding techniques such as HEVC. Experimental results show significant compression efficiency improvement compared to existing methods under different configurations of HEVC.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2023)
Article
Computer Science, Artificial Intelligence
Xuewei Meng, Chuanmin Jia, Xinfeng Zhang, Shanshe Wang, Siwei Ma
Summary: This paper proposes a spatio-temporal correlation guided geometric partitioning (STGEO) scheme to efficiently describe object information in the motion field of video coding. By predicting high-probability partitioning modes and motion candidates, the proposed method saves bits consumed for representing side information. Simulation results show that the proposed approach achieves bit-rate savings of 0.95% and 1.98% on average compared to VTM-8.0 without GEO for Random Access and Low-Delay B configurations, respectively.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2022)
Article
Computer Science, Artificial Intelligence
Zhihao Hu, Dong Xu, Guo Lu, Wei Jiang, Wei Wang, Shan Liu
Summary: In this work, a feature-space video coding framework (FVC) is proposed to perform all major operations in the feature space, including motion estimation, motion compression, motion compensation, and residual compression. The framework also includes two new modules, resolution-adaptive motion coding (RaMC) and resolution-adaptive residual coding (RaRC), for handling different types of motion and residual patterns at different spatial locations. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on benchmark datasets.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
M. Akin Yilmaz, A. Murat Tekalp
Summary: This paper proposes a learned hierarchical bi-directional video codec (LHBDC) that combines the benefits of hierarchical motion-compensated prediction and end-to-end optimization. Experimental results show that the LHBDC achieves the best rate-distortion (R-D) results among existing learned VC schemes. Ablation studies demonstrate the performance gains due to proposed novel tools.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2022)
Article
Computer Science, Information Systems
Li Li, Zhu Li, Shan Liu, Houqiang Li
Summary: This paper investigates two fundamental problems of inter-prediction for Light Detection and Ranging (LiDAR) point cloud geometries: motion estimation (ME) and coding structure. A new criterion is proposed for motion estimation, which uses a linear relationship between the number of 1s and 0s in the multiscale binary prediction residue and the bit cost. The use of hierarchical coding structure is also studied for the first time, and the determination of the coding structure at the group of pictures (GoP)-level is based on rate distortion optimization to improve performance.
IEEE TRANSACTIONS ON MULTIMEDIA
(2022)
Article
Computer Science, Artificial Intelligence
Dengchao Jin, Jianjun Lei, Bo Peng, Zhaoqing Pan, Li Li, Nam Ling
Summary: This paper proposes a novel temporal context-based video compression network (TCVC-Net) to improve the performance of learned video compression by exploiting long-term temporal context and utilizing multi-frequency components in temporal context. It introduces a global temporal reference aggregation (GTRA) module to obtain accurate temporal reference and a temporal conditional codec (TCC) to efficiently compress motion vector and residue. Experimental results demonstrate that the proposed TCVC-Net outperforms state-of-the-art methods in terms of both PSNR and MS-SSIM metrics.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2023)
Article
Computer Science, Artificial Intelligence
Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang
Summary: The study introduces a motion estimation and compensation driven neural network for video frame interpolation, which integrates optical flow and interpolation kernels using an adaptive warping layer. It achieves visually appealing results without the need for hand-crafted features, showing improved computational efficiency compared to existing methods. The proposed MEMC-Net architecture can be seamlessly adapted to various video enhancement tasks and outperforms state-of-the-art algorithms on a wide range of datasets in quantitative and qualitative evaluations.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2021)
Article
Computer Science, Information Systems
Damian Karwowski
Summary: This paper examines possible extensions of the latest version of CABAC used in VVC technology and analyzes the practical limitations of increasing compression efficiency. Results indicate that, on average, the limit of gain in compression for VVC video data is up to 0.2073% with the considered methods of improvements.
Article
Computer Science, Artificial Intelligence
Rongqun Lin, Meng Wang, Pingping Zhang, Shiqi Wang, Sam Kwong
Summary: Recently, there has been significant research attention on learned video compression. However, existing methods use a single hypothesis for motion alignment, leading to inaccurate motion estimation, especially for complex scenes. Inspired by the multiple hypotheses philosophy, we propose a multiple hypotheses based motion compensation approach to enhance efficiency by providing diverse hypotheses. We also introduce a hypotheses attention module and utilize context combination to fuse weighted hypotheses and generate effective contexts for compression.
Article
Engineering, Electrical & Electronic
N. Prette, D. Valsesia, T. Bianchi, E. Magli, M. Naccari, A. Fiandrotti
Summary: This work introduces MMCE-Net, a deep-learning tool aimed at improving the performance of video coding standards based on motion-compensation. The proposed method enhances the accuracy of motion-compensated frames to improve the coding efficiency and rate-distortion performance.
ELECTRONICS LETTERS
(2022)
Article
Engineering, Electrical & Electronic
Haifeng Guo, Sam Kwong, Chuanmin Jia, Shiqi Wang
Summary: Most deep learning-based video compression frameworks rely on motion estimation and compensation, but the artifacts of warped frames limit the performance. In this work, we propose enhanced motion compensation to reduce error propagation. We incorporate a designed convolutional neural network into Open DVC as the enhancement network, and optimize the framework with a single loss function considering the trade-off between bit cost and frame quality. Experimental results show that our model achieves significant bit savings and outperforms Open DVC in terms of PSNR and bit rate savings.
IEEE SIGNAL PROCESSING LETTERS
(2023)
Article
Engineering, Electrical & Electronic
Wei-Jung Chien, Li Zhang, Martin Winken, Xiang Li, Ru-Ling Liao, Han Gao, Chih-Wei Hsu, Hongbin Liu, Chun-Chi Chen
Summary: This paper discusses motion vector coding and block merging techniques in the VVC standard, focusing on whole block-based inter prediction techniques. It introduces features for AMVP mode and merge mode, as well as coding tools like history-based motion vector prediction. Simulation results show that the methods can achieve BD-rate savings and improve subjective picture quality.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2021)
Article
Computer Science, Information Systems
Ka-Hou Chan, Sio-Kei Im
Summary: This article introduces the key technologies involved in four hypothetical probability estimators for Context-based Adaptive Binary Arithmetic Coding (CABAC). The focus is on the selected adaptation rate performed in these estimators, which are selected based on coding efficiency and memory considerations, and also the relationship with the current size of the coding block. The proposed scheme can linearly realize the quantitative representation of probabilistic prediction and describes the scalability potential for higher accuracy. Besides a description of the design concept, this work also discusses motivation and implementation aspects, which are based on simple operations such as bitwise operations and single subsampling for subinterval updates. The experimental results verify the effectiveness of the proposed CABAC method specified in Versatile Video Coding (VVC).
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Sinem Guemues, Fatih Kamisli
Summary: This paper proposes a learned compression system for lossless image compression, achieving state-of-the-art performance with only 59K parameters, much less than other recent learned systems. The system utilizes a neural network to process each pixel's causal neighborhood and obtain probability distribution parameters for compression. Parallel decoding algorithms are implemented to reduce decoding time. The system is compared to traditional and learned systems in terms of compression performance, encoding-decoding times, and computational complexity.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Dengxin Dai, Arun Balajee Vasudevan, Jiri Matas, Luc Van Gool
Summary: This work develops an approach for scene understanding purely based on binaural sounds, which can predict the semantic masks, motion, and depth of sound-making objects. By leveraging cross-modal distillation and spatial sound super-resolution, the performance of auditory perception tasks is significantly improved. Experimental results show good performance in all tasks, mutual benefits between tasks, and importance of microphone quantity and orientation.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Engineering, Electrical & Electronic
Ren Yang, Radu Timofte, Luc Van Gool
Summary: In recent years, there has been an increasing interest in end-to-end learned video compression. Previous works focused on compressing motion maps to exploit temporal redundancy. However, they did not fully utilize historical information in sequential reference frames. This paper proposes an Advanced Learned Video Compression (ALVC) approach with an in-loop frame prediction module, which effectively predicts the target frame from previously compressed frames. The experiments demonstrate the state-of-the-art performance of ALVC in learned video compression.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Computer Science, Artificial Intelligence
Tianfei Zhou, Fatih Porikli, David J. Crandall, Luc Van Gool, Wenguan Wang
Summary: Video segmentation is crucial in various practical applications such as enhancing visual effects in movies, understanding scenes in autonomous driving, and creating virtual background in video conferencing. Deep learning-based approaches have shown promising performance in video segmentation. This survey comprehensively reviews two main research lines - generic object segmentation and video semantic segmentation - by introducing their task settings, background concepts, need, development history, and challenges. Representative literature and datasets are also discussed, and the reviewed methods are benchmarked on well-known datasets. Open issues and opportunities for further research are identified, and a public website is provided to track developments in this field: https://github.com/tfzhou/VS-Survey.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Engineering, Electrical & Electronic
Xin Deng, Yufan Deng, Ren Yang, Wenzhe Yang, Radu Timofte, Mai Xu
Summary: In this paper, a novel network model called MASIC is proposed for stereo image compression. It achieves higher compression efficiency and quality through the introduction of a mask prediction module and mask conditional stereo entropy model.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Computer Science, Artificial Intelligence
Prune Truong, Martin Danelljan, Radu Timofte, Luc Van Gool
Summary: Establishing accurate correspondences between images is important in computer vision applications, and dense approaches offer an alternative to sparse methods. However, dense flow estimation is often inaccurate in certain cases. In this paper, we propose a network, PDC-Net+, that can estimate accurate dense correspondences along with a reliable confidence map. Our approach learns the flow prediction and uncertainty estimation using a probabilistic approach, achieving state-of-the-art results on various challenging datasets.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Huan Wang, Yulun Zhang, Can Qin, Luc Van Gool, Yun Fu
Summary: This article presents a method called Global Aligned Structured Sparsity Learning (GASSL) to tackle the problem of efficient image super-resolution (SR). The method includes two major components: Hessian-Aided Regularization (HAIR) and Aligned Structured Sparsity Learning (ASSL). GASSL outperforms other recent methods in terms of efficiency, as demonstrated by extensive results.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Automation & Control Systems
Kai Zhang, Yawei Li, Jingyun Liang, Jiezhang Cao, Yulun Zhang, Hao Tang, Deng-Ping Fan, Radu Timofte, Luc Van Gool
Summary: This paper proposes a new approach for image denoising by focusing on network architecture design and training data synthesis. The use of a swin-conv block in the image-to-image translation UNet architecture significantly improves denoising performance. Additionally, a practical noise degradation model is designed to handle various types of noise and resizing, leading to improved practicality. Experimental results demonstrate the effectiveness of the proposed methods.
MACHINE INTELLIGENCE RESEARCH
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Maxime Burchi, Radu Timofte
Summary: This paper proposes an improved Efficient Conformer CTC architecture to address the decline in performance of automatic speech recognition systems in noisy speech. The authors use audio and visual modalities to enhance noise robustness and accelerate training. Experimental results on LRS2 and LRS3 datasets show that the proposed approach achieves state-of-the-art performance with lower WER and faster training.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Marcos V. Conde, Florin Vasluianu, Javier Vazquez-Corral, Radu Timofte
Summary: Recent advances in camera designs and imaging pipelines allow us to capture high-quality images using smartphones. However, the limitations of smartphone cameras often result in image artifacts and degradation. Deep learning methods for image restoration can effectively remove these artifacts, but are often not suitable for real-time applications on mobile devices. This paper proposes a lightweight network, LPIENet, for perceptual image enhancement on smartphones. Experimental results demonstrate that our model can handle artifacts and achieve competitive performance with less parameters and operations. Furthermore, the model was deployed directly on commercial smartphones and demonstrated efficient processing of 2K resolution images in under 1 second.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Dario Fuoli, Martin Danelljan, Radu Timofte, Luc Van Gool
Summary: In this work, we propose a recurrent VSR architecture based on a deformable attention pyramid (DAP) to address the VSR problem with strict causal, real-time, and latency constraints. Our DAP aligns and integrates information from the recurrent state into the current frame prediction, overcoming the challenge of unavailable future frame information. By attending to a limited number of spatial locations dynamically predicted by the DAP, we reduce computational cost compared to traditional attention-based methods. Experimental results show the effectiveness of our approach, achieving a significant speed-up and surpassing state-of-the-art methods on benchmark tests.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Article
Computer Science, Artificial Intelligence
Wenqi Wang, Run Wang, Lina Wang, Zhibo Wang, Aoshuang Ye
Summary: Deep neural networks have achieved remarkable success in various tasks, but they are vulnerable to adversarial examples in both image and text domains. Adversarial examples in the text domain can evade DNN-based text analyzers and pose threats to the spread of disinformation. This paper comprehensively surveys the existing studies on adversarial techniques for generating adversarial texts and the corresponding defense methods, aiming to inspire future research in developing robust DNN-based text analyzers against known and unknown adversarial techniques.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)
Article
Automation & Control Systems
Ge-Peng Ji, Mingchen Zhuge, Dehong Gao, Deng-Ping Fan, Christos Sakaridis, Luc Van Gool
Summary: This paper presents a masked vision-language transformer (MVLT) for fashion-specific multi-modal representation, which replaces the bidirectional encoder representations from Transformers (BERT) with the vision transformer architecture. It is the first end-to-end framework for the fashion domain and includes masked image reconstruction (MIR) for fine-grained understanding of fashion. MVLT is an extensible and convenient architecture that can handle raw multi-modal inputs without extra pre-processing models and shows improvements in retrieval and recognition tasks compared to Kaleido-BERT, the Fashion-Gen 2018 winner.
MACHINE INTELLIGENCE RESEARCH
(2023)
Article
Computer Science, Artificial Intelligence
Kai Zhang, Qi Liu, Hao Qian, Biao Xiang, Qing Cui, Jun Zhou, Enhong Chen
Summary: This paper proposes a novel model called EATN for accurately classifying sentiment polarities towards aspects in multiple domains in sentiment analysis tasks. The model incorporates a Domain Adaptation Module (DAM) to learn common features and uses multiple-kernel selection method to reduce feature discrepancy among domains. Additionally, EATN includes an aspect-oriented multi-head attention mechanism to capture the direct associations between aspects and contextual sentiment words. Extensive experiments on six public datasets demonstrate the effectiveness and universality of the proposed method compared to current state-of-the-art methods.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
(2023)