Article
Computer Science, Artificial Intelligence
Dongjing Shan, Xiongwei Zhang, Tieyong Cao, Limin Wang, Chao Zhang
Summary: In this article, a three-stage hierarchical neural network is proposed for saliency detection, combining fast R-CNN, self-attention mechanism, and global regression model. Experimental results demonstrate excellent performance on several benchmark datasets and comparisons with 12 previous methods were conducted.
IEEE INTELLIGENT SYSTEMS
(2021)
Article
Computer Science, Information Systems
Yuanhao Yue, Qin Zou, Hongkai Yu, Qian Wang, Zhongyuan Wang, Song Wang
Summary: This study proposes a novel end-to-end trainable network for co-saliency detection within a single image. The network combines bottom-up and top-down strategies by using ground-truth masks as top-down guidance and constructing triplet proposals for regional feature mapping and clustering.
SCIENCE CHINA-INFORMATION SCIENCES
(2023)
Article
Engineering, Electrical & Electronic
Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, Jieping Ye
Summary: Object detection, a fundamental problem in computer vision, has received significant attention in recent years. This article reviews the rapid technological evolution of object detection over the past two decades and its impact on the entire computer vision field. It covers various topics such as milestone detectors, datasets, metrics, fundamental building blocks, speedup techniques, and state-of-the-art methods.
PROCEEDINGS OF THE IEEE
(2023)
Review
Computer Science, Information Systems
Ayoub Benali Amjoud, Mustapha Amrouch
Summary: This paper examines the evolution of object detection in the era of deep learning, reviews various state-of-the-art algorithms and their underlying concepts, and classifies them into anchor-based, anchor-free, and transformer-based detectors. The paper discusses the insights behind these algorithms and provides experimental analyses comparing quality metrics, speed/accuracy trade-offs, and training methodologies. Additionally, it compares major convolutional neural networks for object detection, highlights the strengths and limitations of each model, and summarizes the development of object detection methods under deep learning through simple graphical illustrations. Finally, the paper identifies future research directions.
Article
Medicine, General & Internal
Ali H. Al-Timemy, Laith Alzubaidi, Zahraa M. Mosa, Hazem Abdelmotaal, Nebras H. Ghaeb, Alexandru Lavric, Rossen M. Hazarbassanov, Hidenori Takahashi, Yuantong Gu, Siamak Yousefi
Summary: In this study, a deep learning model is proposed to accurately and robustly detect early clinical keratoconus (KCN). By extracting features from three different corneal maps using Xception and InceptionResNetV2 deep learning architectures, and then fusing the features, subclinical forms of KCN can be detected with high accuracy. The model achieved an AUC of 0.99 and an accuracy range of 97-100% in distinguishing normal eyes from eyes with subclinical and established KCN. The model was further validated on an independent dataset with an AUC of 0.91-0.92 and an accuracy range of 88-92%. This model is a step toward improving the detection of clinical and subclinical forms of KCN.
Article
Computer Science, Artificial Intelligence
Deqiang Cheng, Ruihang Liu, Jiahan Li, Song Liang, Qiqi Kou, Kai Zhao
Summary: This study introduces a lightweight saliency prediction model based on convolutional neural networks, utilizing multi-scale collaboration learning of global and local information, achieving competitive and consistent results on challenging benchmark datasets with better prediction performance, fewer parameters, and faster inference speed.
IMAGE AND VISION COMPUTING
(2021)
Article
Plant Sciences
Yange Sun, Fei Wu, Huaping Guo, Ran Li, Jianfeng Yao, Jianbo Shen
Summary: This paper introduces a novel method called TeaDiseaseNet for tea disease detection. It utilizes a multi-scale self-attention mechanism and a channel attention mechanism to achieve accurate detection and localization of tea disease information. Experimental results demonstrate its superior performance in scenarios with complex backgrounds and varying disease scales, highlighting its potential for intelligent tea disease diagnosis.
FRONTIERS IN PLANT SCIENCE
(2023)
Article
Computer Science, Artificial Intelligence
Shunyu Yao, Miao Zhang, Yongri Piao, Chaoyi Qiu, Huchuan Lu
Summary: This paper proposes a depth injection framework to enhance the semantic representation by injecting depth maps into the encoder. A depth injection module is also introduced to complement and guide the information between depth maps and the encoder. Experimental results show that the proposed method achieves state-of-the-art performance on multiple datasets and exhibits strong generalization ability.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2023)
Article
Computer Science, Information Systems
Tianhao Xu, Zhenming Yuan
Summary: This study proposes a low-cost and automatic detection method for pulmonary tuberculosis images on chest X-rays to assist primary radiologists. By introducing coordinate attention mechanism and convolution neural network, the method achieves better accuracy in identifying and classifying pulmonary tuberculosis images. The evaluation on a public dataset shows high accuracy and recall rate, which can aid radiologists in auxiliary diagnosis.
Article
Environmental Sciences
Mohammed Q. Q. Alkhatib, Mina Al-Saad, Nour Aburaed, Saeed Almansoori, Jaime Zabalza, Stephen Marshall, Hussain Al-Ahmad
Summary: A novel method called Tri-CNN and a three-branch feature fusion approach are proposed to address the issue of insufficient training samples in hyperspectral image (HSI) classification. Experimental results demonstrate that the proposed method exhibits remarkable performance in terms of overall accuracy (OA), average accuracy (AA), and Kappa metrics when compared to existing methods.
Article
Nuclear Science & Technology
Theodore Papamarkou, Hayley Guy, Bryce Kroencke, Jordan Miller, Preston Robinette, Daniel Schultz, Jacob Hinkle, Laura Pullum, Catherine Schuman, Jeremy Renshaw, Stylianos Chatzidakis
Summary: This paper discusses the use of residual neural networks for real-time corrosion detection in nuclear fuel canisters, demonstrating the potential for automating inspections, reducing costs, and minimizing radiation exposure. The proposed approach involves cropping and training the network on images to accurately detect corroded areas and classify images with high precision.
NUCLEAR ENGINEERING AND TECHNOLOGY
(2021)
Article
Computer Science, Artificial Intelligence
Talles B. Viana, Victor L. F. Souza, Adriano L. I. Oliveira, Rafael M. O. Cruz, Robert Sabourin
Summary: Despite recent advances in computer vision, the problem of offline handwritten signature verification remains challenging. Deep learning methods have been investigated to learn feature representations of handwritten signatures. A multi-task framework based on deep contrastive learning is proposed to improve signature verification by adjusting the feature representations of genuine and skilled forgery signatures.
EXPERT SYSTEMS WITH APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Yukti Aparna, Yukti Bhatia, Rachna Rai, Varun Gupta, Naveen Aggarwal, Aparna Akula
Summary: Potholes on roads are a major cause of accidents and vehicle wear and tear. Current pothole detection techniques have drawbacks, so this study aims to analyze the feasibility and accuracy of thermal imaging for pothole detection. Deep learning using convolutional neural networks approach is adopted, and a comparison between self-built and pre-trained models is conducted. The results show that thermal imaging achieved a highest accuracy of 97.08% with one of the pre-trained models. This study is important for guiding future research in the field of pothole detection.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2022)
Article
Engineering, Electrical & Electronic
Lucas da Silva Nolasco, Andre Eugenio Lazzaretti, Bruna Machado Mulinari
Summary: This paper presents an integrated method for handling high-frequency NILM signals, including detection, feature extraction, and classification. The results show that the accuracy of this method is above 90% in most cases, surpassing state-of-the-art approaches, and it also includes a multi-label procedure to increase the recognition of multiple loads.
IEEE SENSORS JOURNAL
(2022)
Article
Geochemistry & Geophysics
Shaohui Mei, Ruoqiao Jiang, Mingyang Ma, Chao Song
Summary: This article proposes a novel cyclic polar coordinate convolutional layer (CPCCL) for CNNs to handle the problem of rotation invariance. The CPCCL converts rotation variation into translation variation using polar coordinates transformation, and employs cyclic convolution to handle the translation variation. Experimental results demonstrate that the proposed CPCCL can effectively handle the rotation-sensitive problem in traditional CNNs and outperforms several state-of-the-art rotation-invariant feature learning algorithms.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Computer Science, Information Systems
Jiaying Lin, Xin Tan, Ke Xu, Lizhuang Ma, Rynsonw. H. Lau
Summary: This article proposes a frequency-based method called FBNet for camouflaged object detection. The method suppresses confusing high-frequency texture information to separate camouflaged objects from the background. It also includes frequency-aware context aggregation and adaptive frequency attention modules, as well as a gradient-weighted loss function to focus on contour details. Experimental results demonstrate that FBNet outperforms state-of-the-art methods in camouflaged object detection.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Cheng Xu, Zejun Chen, Jiajie Mai, Xuemiao Xu, Shengfeng He
Summary: Person Image Synthesis addresses two critical problems in transferring appearance of a source person image to a target pose: synthesis distortion due to pose and appearance entanglement, and failure in preserving original semantics. The proposed PAC-GAN explicitly tackles these problems by using a component-wise transferring model and a high-level semantic constraint. Experimental results on DeepFashion dataset demonstrate the superiority of our method in maintaining pose and attribute consistencies.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Haiyang Mei, Letian Yu, Ke Xu, Yang Wang, Xin Yang, Xiaopeng Wei, Rynson W. H. Lau
Summary: This article introduces a method for segmenting mirrors and proposes a novel network model called MirrorNet+ to address this problem. The authors construct a large-scale mirror segmentation dataset and conduct extensive experiments to validate the effectiveness and generalization capability of the proposed method. The article also discusses applications of mirror segmentation and possible future research directions.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Hardware & Architecture
Sucheng Ren, Wenxi Liu, Jianbo Jiao, Guoqiang Han, Shengfeng He
Summary: In this study, we propose a new method to generate distraction-free edge features by incorporating holistic interdependencies between high-level features. Experimental results demonstrate that our method outperforms the state-of-the-art methods on benchmark datasets, with fast inference speed on a single GPU.
Article
Computer Science, Artificial Intelligence
Tao Yan, Mingyue Li, Bin Li, Yang Yang, Rynson W. H. Lau
Summary: This research proposes a method for removing rain streaks from light field images by simultaneously processing all sub-views using 4D convolutional layers and detecting rain streaks with a multi-scale self-guided Gaussian process module. Through training on virtual and real-world rainy light field images, accurate detection and removal of rain streaks are achieved, leading to the restoration of rain-free light field images.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2023)
Article
Computer Science, Artificial Intelligence
Yongtuo Liu, Sucheng Ren, Liangyu Chai, Hanjie Wu, Dan Xu, Jing Qin, Shengfeng He
Summary: Labeling is challenging for crowd counting, and recent methods have proposed semi-supervised approaches to reduce labeling efforts. However, the None-or-All labeling strategy is suboptimal as it does not consider the diversity of individuals in unlabeled crowd images. In this study, we propose breaking the labeling chain and reducing spatial labeling redundancy to improve semi-supervised crowd counting. We annotate representative regions, analyze region representativeness, and directly supervise unlabeled regions using similarity among individuals. Our experiments show significant performance improvement compared to previous methods.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Software Engineering
Zhansheng Li, Yangyang Xu, Nanxuan Zhao, Yang Zhou, Yongtuo Liu, Dahua Lin, Shengfeng He
Summary: This study proposes a new anime translation framework by utilizing the prior knowledge of a pre-trained StyleGAN model. The framework incorporates disentangled encoders to separately embed structure and appearance information and includes a FaceBank aggregation method for generating in-domain animes. A new anime portrait parsing dataset, Danbooru-Parsing, is introduced to connect face semantics with appearances, enabling a constrained translation setting. The experiments demonstrate the effectiveness and value of the new dataset and method, providing the first feasible solution for anime translation.
ACM TRANSACTIONS ON GRAPHICS
(2023)
Article
Computer Science, Artificial Intelligence
Yong Du, Junjie Deng, Yulong Zheng, Junyu Dong, Shengfeng He
Summary: The crucial challenge of single image deraining is to remove rain streaks while preserving image details. This paper proposes a novel deep network called DSDNet, which estimates rain streaks and detail loss separately, and predicts a rain mask indicating the location and intensity of rain. Extensive experiments show that the proposed method outperforms state-of-the-art methods and is effective in joint tasks of single image deraining, detection, and segmentation.
COMPUTER VISION AND IMAGE UNDERSTANDING
(2023)
Article
Computer Science, Artificial Intelligence
Yang Zhou, Hanjie Wu, Wenxi Liu, Zheng Xiong, Jing Qin, Shengfeng He
Summary: Synthesizing novel views from a single view image is a challenging task, but can be improved by expanding to a multi-view setting. By leveraging stereo prior, a pseudo-stereo viewpoint is generated to assist in 3D reconstruction, making the view synthesis process simpler. A self-rectified stereo synthesis approach is proposed to correct erroneous regions and generate high-quality stereo images.
INTERNATIONAL JOURNAL OF COMPUTER VISION
(2023)
Article
Engineering, Electrical & Electronic
Weiwei Cai, Huaidong Zhang, Xuemiao Xu, Shengfeng He, Kun Zhang, Jing Qin
Summary: In this paper, an automatic retouching approach for scratched photographs is proposed, which utilizes scratch and background context for processing in two stages. Experimental results demonstrate that the proposed method outperforms existing methods. Additionally, two new scratched photo datasets are created to promote development in the field.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2023)
Article
Computer Science, Software Engineering
Xiaotian Qiao, Ying Cao, Rynson W. H. Lau
Summary: A clear and easy-to-follow layout is important for visual notes. In this article, a novel approach is proposed to automatically optimize the layouts of visual notes by predicting the design order and warping the contents accordingly. The results show that the approach can effectively improve the layout of visual notes for better readability.
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
(2023)
Article
Computer Science, Artificial Intelligence
Haiyang Mei, Xin Yang, Letian Yu, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau
Summary: This paper addresses the important problem of detecting glass surfaces from a single RGB image by proposing a novel glass detection network called GDNet-B. The network explores contextual cues and integrates boundary features to achieve satisfying detection results. The effectiveness and generalization capability of GDNet-B are further validated and its potential applications and future research directions are discussed.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Yuhao Liu, Qing Guo, Lan Fu, Zhanghan Ke, Ke Xu, Wei Feng, Ivor W. Tsang, Rynson W. H. Lau
Summary: In this paper, a novel structure-informed shadow removal network (StructNet) is proposed to address the problem of shadow remnants in existing deep learning-based methods. StructNet reconstructs the structure information of the input image without shadows and uses it to guide the image-level shadow removal. Two main modules, MSFE and MFRA, are developed to extract image structural features and regularize feature consistency. Additionally, an extension called MStructNet is proposed to exploit multi-level structure information and improve shadow removal performance.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Yongtuo Liu, Dan Xu, Sucheng Ren, Hanjie Wu, Hongmin Cai, Shengfeng He
Summary: This paper proposes a method to separate domain-invariant crowd and domain-specific background from crowd images, and designs a fine-grained domain adaptation method for crowd counting. By learning crowd segmentation and designing a crowd-aware adaptation mechanism, the method consistently outperforms previous approaches in domain adaptation scenarios.
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME
(2023)
Article
Computer Science, Artificial Intelligence
Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo
Summary: Automatic segmentation of medical images is crucial for disease diagnosis. This paper proposes a dual-path segmentation model called TranSiam for multi-modal medical images. The model utilizes parallel CNNs and a Transformer layer to extract features from different modalities, and aggregates the features using a locality-aware aggregation block.
EXPERT SYSTEMS WITH APPLICATIONS
(2024)