4.7 Article

SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection

Journal

INTERNATIONAL JOURNAL OF COMPUTER VISION
Volume 115, Issue 3, Pages 330-344

Publisher

SPRINGER
DOI: 10.1007/s11263-015-0822-0

Keywords

Convolutional neural networks; Deep learning; Feature learning; Saliency detection

Funding

  1. RGC of Hong Kong (RGC) [CityU 115112, CityU 21201914]

Ask authors/readers for more resources

Existing computational models for salient object detection primarily rely on hand-crafted features, which are only able to capture low-level contrast information. In this paper, we learn the hierarchical contrast features by formulating salient object detection as a binary labeling problem using deep learning techniques. A novel superpixelwise convolutional neural network approach, called SuperCNN, is proposed to learn the internal representations of saliency in an efficient manner. In contrast to the classical convolutional networks, SuperCNN has four main properties. First, the proposed method is able to learn the hierarchical contrast features, as it is fed by two meaningful superpixel sequences, which is much more effective for detecting salient regions than feeding raw image pixels. Second, as SuperCNN recovers the contextual information among superpixels, it enables large context to be involved in the analysis efficiently. Third, benefiting from the superpixelwise mechanism, the required number of predictions for a densely labeled map is hugely reduced. Fourth, saliency can be detected independent of region size by utilizing a multiscale network structure. Experiments show that SuperCNN can robustly detect salient objects and outperforms the state-of-the-art methods on three benchmark datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

Frequency-aware Camouflaged Object Detection

Jiaying Lin, Xin Tan, Ke Xu, Lizhuang Ma, Rynsonw. H. Lau

Summary: This article proposes a frequency-based method called FBNet for camouflaged object detection. The method suppresses confusing high-frequency texture information to separate camouflaged objects from the background. It also includes frequency-aware context aggregation and adaptive frequency attention modules, as well as a gradient-weighted loss function to focus on contour details. Experimental results demonstrate that FBNet outperforms state-of-the-art methods in camouflaged object detection.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2023)

Article Computer Science, Information Systems

Pose- and Attribute-consistent Person Image Synthesis

Cheng Xu, Zejun Chen, Jiajie Mai, Xuemiao Xu, Shengfeng He

Summary: Person Image Synthesis addresses two critical problems in transferring appearance of a source person image to a target pose: synthesis distortion due to pose and appearance entanglement, and failure in preserving original semantics. The proposed PAC-GAN explicitly tackles these problems by using a component-wise transferring model and a high-level semantic constraint. Experimental results on DeepFashion dataset demonstrate the superiority of our method in maintaining pose and attribute consistencies.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2023)

Article Computer Science, Information Systems

Mirror Segmentation via Semantic-aware Contextual Contrasted Feature Learning

Haiyang Mei, Letian Yu, Ke Xu, Yang Wang, Xin Yang, Xiaopeng Wei, Rynson W. H. Lau

Summary: This article introduces a method for segmenting mirrors and proposes a novel network model called MirrorNet+ to address this problem. The authors construct a large-scale mirror segmentation dataset and conduct extensive experiments to validate the effectiveness and generalization capability of the proposed method. The article also discusses applications of mirror segmentation and possible future research directions.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2023)

Article Computer Science, Hardware & Architecture

Edge Distraction-aware Salient Object Detection

Sucheng Ren, Wenxi Liu, Jianbo Jiao, Guoqiang Han, Shengfeng He

Summary: In this study, we propose a new method to generate distraction-free edge features by incorporating holistic interdependencies between high-level features. Experimental results demonstrate that our method outperforms the state-of-the-art methods on benchmark datasets, with fast inference speed on a single GPU.

IEEE MULTIMEDIA (2023)

Article Computer Science, Artificial Intelligence

Rain Removal From Light Field Images With 4D Convolution and Multi-Scale Gaussian Process

Tao Yan, Mingyue Li, Bin Li, Yang Yang, Rynson W. H. Lau

Summary: This research proposes a method for removing rain streaks from light field images by simultaneously processing all sub-views using 4D convolutional layers and detecting rain streaks with a multi-scale self-guided Gaussian process module. Through training on virtual and real-world rainy light field images, accurate detection and removal of rain streaks are achieved, leading to the restoration of rain-free light field images.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Article Computer Science, Artificial Intelligence

Reducing Spatial Labeling Redundancy for Active Semi-Supervised Crowd Counting

Yongtuo Liu, Sucheng Ren, Liangyu Chai, Hanjie Wu, Dan Xu, Jing Qin, Shengfeng He

Summary: Labeling is challenging for crowd counting, and recent methods have proposed semi-supervised approaches to reduce labeling efforts. However, the None-or-All labeling strategy is suboptimal as it does not consider the diversity of individuals in unlabeled crowd images. In this study, we propose breaking the labeling chain and reducing spatial labeling redundancy to improve semi-supervised crowd counting. We annotate representative regions, analyze region representativeness, and directly supervise unlabeled regions using similarity among individuals. Our experiments show significant performance improvement compared to previous methods.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Software Engineering

Parsing-Conditioned Anime Translation: A New Dataset and Method

Zhansheng Li, Yangyang Xu, Nanxuan Zhao, Yang Zhou, Yongtuo Liu, Dahua Lin, Shengfeng He

Summary: This study proposes a new anime translation framework by utilizing the prior knowledge of a pre-trained StyleGAN model. The framework incorporates disentangled encoders to separately embed structure and appearance information and includes a FaceBank aggregation method for generating in-domain animes. A new anime portrait parsing dataset, Danbooru-Parsing, is introduced to connect face semantics with appearances, enabling a constrained translation setting. The experiments demonstrate the effectiveness and value of the new dataset and method, providing the first feasible solution for anime translation.

ACM TRANSACTIONS ON GRAPHICS (2023)

Article Computer Science, Artificial Intelligence

DSDNet: Toward single image deraining with self-paced curricular dual stimulations

Yong Du, Junjie Deng, Yulong Zheng, Junyu Dong, Shengfeng He

Summary: The crucial challenge of single image deraining is to remove rain streaks while preserving image details. This paper proposes a novel deep network called DSDNet, which estimates rain streaks and detail loss separately, and predicts a rain mask indicating the location and intensity of rain. Extensive experiments show that the proposed method outperforms state-of-the-art methods and is effective in joint tasks of single image deraining, detection, and segmentation.

COMPUTER VISION AND IMAGE UNDERSTANDING (2023)

Article Computer Science, Artificial Intelligence

Single-View View Synthesis with Self-rectified Pseudo-Stereo

Yang Zhou, Hanjie Wu, Wenxi Liu, Zheng Xiong, Jing Qin, Shengfeng He

Summary: Synthesizing novel views from a single view image is a challenging task, but can be improved by expanding to a multi-view setting. By leveraging stereo prior, a pseudo-stereo viewpoint is generated to assist in 3D reconstruction, making the view synthesis process simpler. A self-rectified stereo synthesis approach is proposed to correct erroneous regions and generate high-quality stereo images.

INTERNATIONAL JOURNAL OF COMPUTER VISION (2023)

Article Engineering, Electrical & Electronic

Contextual-Assisted Scratched Photo Restoration

Weiwei Cai, Huaidong Zhang, Xuemiao Xu, Shengfeng He, Kun Zhang, Jing Qin

Summary: In this paper, an automatic retouching approach for scratched photographs is proposed, which utilizes scratch and background context for processing in two stages. Experimental results demonstrate that the proposed method outperforms existing methods. Additionally, two new scratched photo datasets are created to promote development in the field.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2023)

Article Computer Science, Software Engineering

Design Order Guided Visual Note Layout Optimization

Xiaotian Qiao, Ying Cao, Rynson W. H. Lau

Summary: A clear and easy-to-follow layout is important for visual notes. In this article, a novel approach is proposed to automatically optimize the layouts of visual notes by predicting the design order and warping the contents accordingly. The results show that the approach can effectively improve the layout of visual notes for better readability.

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS (2023)

Article Computer Science, Artificial Intelligence

Large-Field Contextual Feature Learning for Glass Detection

Haiyang Mei, Xin Yang, Letian Yu, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau

Summary: This paper addresses the important problem of detecting glass surfaces from a single RGB image by proposing a novel glass detection network called GDNet-B. The network explores contextual cues and integrates boundary features to achieve satisfying detection results. The effectiveness and generalization capability of GDNet-B are further validated and its potential applications and future research directions are discussed.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Structure-Informed Shadow Removal Networks

Yuhao Liu, Qing Guo, Lan Fu, Zhanghan Ke, Ke Xu, Wei Feng, Ivor W. Tsang, Rynson W. H. Lau

Summary: In this paper, a novel structure-informed shadow removal network (StructNet) is proposed to address the problem of shadow remnants in existing deep learning-based methods. StructNet reconstructs the structure information of the input image without shadows and uses it to guide the image-level shadow removal. Two main modules, MSFE and MFRA, are developed to extract image structural features and regularize feature consistency. Additionally, an extension called MStructNet is proposed to exploit multi-level structure information and improve shadow removal performance.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Fine-grained Domain Adaptive Crowd Counting via Point-derived Segmentation

Yongtuo Liu, Dan Xu, Sucheng Ren, Hanjie Wu, Hongmin Cai, Shengfeng He

Summary: This paper proposes a method to separate domain-invariant crowd and domain-specific background from crowd images, and designs a fine-grained domain adaptation method for crowd counting. By learning crowd segmentation and designing a crowd-aware adaptation mechanism, the method consistently outperforms previous approaches in domain adaptation scenarios.

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME (2023)

Article Computer Science, Artificial Intelligence

TranSiam: Aggregating multi-modal visual features with locality for medical image segmentation

Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo

Summary: Automatic segmentation of medical images is crucial for disease diagnosis. This paper proposes a dual-path segmentation model called TranSiam for multi-modal medical images. The model utilizes parallel CNNs and a Transformer layer to extract features from different modalities, and aggregates the features using a locality-aware aggregation block.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

No Data Available