4.7 Article

Medical image classification using synergic deep learning

Journal

MEDICAL IMAGE ANALYSIS
Volume 54, Issue -, Pages 10-19

Publisher

ELSEVIER
DOI: 10.1016/j.media.2019.02.010

Keywords

Medical image classification; Intra-class variation; Inter-class similarity; Synergic deep learning model

Funding

  1. National Natural Science Foundation of China [61771397, 61471297]

Ask authors/readers for more resources

The classification of medical images is an essential task in computer-aided diagnosis, medical image retrieval and mining. Although deep learning has shown proven advantages over traditional methods that rely on the handcrafted features, it remains challenging due to the significant intra-class variation and inter-class similarity caused by the diversity of imaging modalities and clinical pathologies. In this paper, we propose a synergic deep learning (SDL) model to address this issue by using multiple deep convolutional neural networks (DCNNs) simultaneously and enabling them to mutually learn from each other. Each pair of DCNNs has their learned image representation concatenated as the input of a synergic network, which has a fully connected structure that predicts whether the pair of input images belong to the same class. Thus, if one DCNN makes a correct classification, a mistake made by the other DCNN leads to a synergic error that serves as an extra force to update the model. This model can be trained end-to-end under the supervision of classification errors from DCNNs and synergic errors from each pair of DCNNs. Our experimental results on the ImageCLEF-2015, lmageCLEF-2016, ISIC-2016, and ISIC-2017 datasets indicate that the proposed SDL model achieves the state-of-the-art performance in these medical image classification tasks. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Intra- and Inter-Pair Consistency for Semi-Supervised Gland Segmentation

Yutong Xie, Jianpeng Zhang, Zhibin Liao, Johan Verjans, Chunhua Shen, Yong Xia

Summary: In this paper, a semi-supervised model based on intra- and inter-pair consistency is proposed for gland segmentation in histology tissue images. By utilizing the relationships between different images in the feature space and imposing consistency constraints, the model achieves improved accuracy.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

Article Computer Science, Interdisciplinary Applications

Learning From Ambiguous Labels for Lung Nodule Malignancy Prediction

Zehui Liao, Yutong Xie, Shishuai Hu, Yong Xia

Summary: In this paper, a multi-view 'divide-and-rule' model is proposed to learn from reliable and ambiguous annotations for lung nodule malignancy prediction. The nodules are divided into three sets based on the consistency and reliability of the annotations. The proposed model consists of three DAR models and is trained following a two-stage procedure. Experimental results show the effectiveness and superiority of the model in learning from ambiguous labels and predicting lung nodule malignancy.

IEEE TRANSACTIONS ON MEDICAL IMAGING (2022)

Article Computer Science, Information Systems

A Proposal-Free One-Stage Framework for Referring Expression Comprehension and Generation via Dense Cross-Attention

Mengyang Sun, Wei Suo, Peng Wang, Yanning Zhang, Qi Wu

Summary: This paper presents a proposal-free one-stage (PFOS) framework that can directly regress the region-of-interest from the image or generate unambiguous descriptions in an end-to-end manner. By taking the dense-grid of images as input and using a cross-attention transformer, the model learns multi-modal correspondences and eliminates the need for additional annotations or off-the-shelf detectors in the mainstream two-stage methods. Furthermore, the traditional two-stage listener-speaker framework is expanded to be jointly trained by a one-stage learning paradigm, resulting in state-of-the-art performance on comprehension and competitive results for generation.

IEEE TRANSACTIONS ON MULTIMEDIA (2023)

Article Computer Science, Artificial Intelligence

Visual Grounding Via Accumulated Attention

Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan

Summary: Visual grounding aims to locate the most relevant object or region in an image based on natural language queries. This paper proposes an attention module to reduce internal redundancies and an accumulated attention mechanism to capture the relationship among different kinds of information. Additionally, noise is introduced to bridge the distribution gap between human-labeled training data and real-world poor quality data, improving the performance and robustness of the VG models. Experimental results demonstrate the superiority of the proposed methods on various datasets in terms of accuracy.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

Rethinking and Improving Feature Pyramids for One-Stage Referring Expression Comprehension

Wei Suo, Mengyang Sun, Peng Wang, Yanning Zhang, Qi Wu

Summary: Referring Expression Comprehension (REC) is a crucial task in the vision-and-language community, and it plays a vital role in various cross-modal tasks. Existing research focuses on a one-stage paradigm, treating REC as a language-conditioned object detection task to achieve a balance between speed and accuracy. However, previous frameworks overlook the importance of integrating multi-level features and often rely on single-scale features for target localization.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Article Computer Science, Artificial Intelligence

HOP plus : History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation

Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu

Summary: This paper proposes an enhanced and history-aware pre-training method for Vision-and-Language Navigation (VLN), which introduces three novel VLN-specific proxy tasks and a memory network to improve historical knowledge learning and action prediction. The proposed method achieves new state-of-the-art performance on four downstream VLN tasks.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Engineering, Electrical & Electronic

Multi-Granularity Aggregation Transformer for Joint Video-Audio-Text Representation Learning

Mengge He, Wenjing Du, Zhiquan Wen, Qing Du, Yutong Xie, Qi Wu

Summary: In this paper, a Multi-Granularity Aggregation Transformer (MGAT) is proposed for joint video-audio-text representation learning. The method overcomes the limitations of existing methods by designing a multi-granularity transformer module and an attention-guided aggregation module. The aggregated information is aligned with text information at different hierarchical levels using consistency loss and contrastive loss. Experimental results demonstrate the superiority of the proposed method on tasks such as video-paragraph retrieval and video captioning.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2023)

Article Computer Science, Artificial Intelligence

Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering

Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Summary: TextVQA aims to produce correct answers for questions about images with multiple scene texts. This paper introduces 3D geometric information into the spatial reasoning process to capture contextual knowledge. Experimental results show that the proposed method achieves state-of-the-art performance on TextVQA and ST-VQA datasets.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

Article Computer Science, Cybernetics

Data Hiding With Deep Learning: A Survey Unifying Digital Watermarking and Steganography

Zihan Wang, Olivia Byrnes, Hu Wang, Ruoxi Sun, Congbo Ma, Huaming Chen, Qi Wu, Minhui Xue

Summary: The use of deep learning techniques in data hiding has greatly advanced secure communication and identity verification fields. Digital watermarking and steganography techniques, by embedding information into noise-tolerant signals like audio, video, or images, can protect sensitive intellectual property (IP) and enable confidential communication for authorized parties. This survey provides a systematic overview of recent developments in deep learning techniques for data hiding, based on model architectures and noise injection methods. It also suggests and discusses potential future research directions that combine digital watermarking and steganography in software engineering to enhance security and mitigate risks. This contribution promotes the creation of a more trustworthy digital world and advances responsible artificial intelligence (AI).

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

ReFs: A hybrid pre-training paradigm for 3D medical image segmentation

Yutong Xie, Jianpeng Zhang, Lingqiao Liu, Hu Wang, Yiwen Ye, Johan Verjans, Yong Xia

Summary: This paper proposes a hybrid pre-training paradigm that combines self-supervised learning and supervised learning to improve the representation quality for medical image segmentation tasks. It introduces a reference task in self-supervised learning and optimizes the model using a gradient matching method. The experimental results demonstrate the effectiveness of this approach on multiple medical image segmentation benchmarks.

MEDICAL IMAGE ANALYSIS (2024)

Proceedings Paper Computer Science, Artificial Intelligence

UniMiSS: Universal Medical Self-supervised Learning via Breaking Dimensionality Barrier

Yutong Xie, Jianpeng Zhang, Yong Xia, Qi Wu

Summary: This paper introduces the application of self-supervised learning in medical image analysis and proposes a universal medical self-supervised representation learning framework called UniMiSS, which utilizes 2D images to compensate for the lack of 3D data. To enable self-supervised learning with both 2D and 3D images, the paper designs a medical Transformer (MiT) and trains it using self-distillation. Experiments demonstrate that UniMiSS achieves promising performance on various medical image analysis tasks.

COMPUTER VISION, ECCV 2022, PT XXI (2022)

Article Computer Science, Information Systems

Show, Price and Negotiate: A Negotiator With Online Value Look-Ahead

Amin Parvaneh, Ehsan Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton van den Hengel

Summary: This study proposes a modular deep neural network called Price Negotiator to improve negotiation in online shopping. It addresses the challenges by considering item images, finding similar items, predicting price actions, and adjusting prices based on predicted actions.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Article Computer Science, Information Systems

Co-LDL: A Co-Training-Based Label Distribution Learning Method for Tackling Label Noise

Zeren Sun, Huafeng Liu, Qiong Wang, Tianfei Zhou, Qi Wu, Zhenmin Tang

Summary: This paper proposes an end-to-end framework named Co-LDL for addressing the performance degradation of deep neural networks caused by label noise. The framework incorporates the low-loss sample selection strategy with label distribution learning and trains two deep neural networks simultaneously to communicate useful knowledge. Additionally, a self-supervised module is introduced to enhance the learned representations.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Article Computer Science, Information Systems

Robust Learning From Noisy Web Images Via Data Purification for Fine-Grained Recognition

Chuanyi Zhang, Qiong Wang, Guosen Xie, Qi Wu, Fumin Shen, Zhenmin Tang

Summary: This article introduces a method for learning fine-grained tasks from web data, which purifies noisy training sets by identifying and distinguishing noisy images, and trains models to alleviate the effects of noise.

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Article Computer Science, Artificial Intelligence

Multi-Intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline

Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li

Summary: The control of traffic signals is crucial in relieving traffic congestion in urban areas. However, it is difficult due to the complexity of real-world traffic dynamics. To address this, the researchers propose a new dataset and a novel model based on deep reinforcement learning for optimizing multi-intersection traffic control. The experimental results show that the proposed model outperforms other methods.

IEEE OPEN JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Simultaneous alignment and surface regression using hybrid 2D-3D networks for 3D coherent layer segmentation of retinal OCT images with full and annotations

Hong Liu, Dong Wei, Donghuan Lu, Xiaoying Tang, Liansheng Wang, Yefeng Zheng

Summary: This study proposes a framework based on hybrid 2D-3D convolutional neural networks for obtaining continuous 3D retinal layer surfaces from OCT volumes. The framework works well with both full and sparse annotations and utilizes alignment displacement vectors and layer segmentation to align the B-scans and segment the layers. Experimental results show that the framework outperforms state-of-the-art 2D deep learning methods in terms of layer segmentation accuracy and cross-B-scan 3D continuity.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

WarpDrive: Improving spatial normalization using manual refinements

Simon Oxenford, Ana Sofia Rios, Barbara Hollunder, Clemens Neudorfer, Alexandre Boutet, Gavin J. B. Elias, Jurgen Germann, Aaron Loh, Wissam Deeb, Bryan Salvato, Leonardo Almeida, Kelly D. Foote, Robert Amaral, Paul B. Rosenberg, David F. Tang-Wai, David A. Wolk, Anna D. Burke, Marwan N. Sabbagh, Stephen Salloway, M. Mallar Chakravarty, Gwenn S. Smith, Constantine G. Lyketsos, Michael S. Okun, William S., Zoltan Mari, Francisco A. Ponce, Andres Lozano, Wolf-Julian Neumann, Bassam Al-Fatly, Andreas Horn

Summary: Spatial normalization is a method to map subject brain images to an average template brain, allowing comparison of brain imaging results. We introduce a novel tool called WarpDrive, which enables manual refinements of image alignment after automated registration. The tool improves accuracy of data representation and aids in understanding patient outcomes.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis

Ricards Marcinkevics, Patricia Reis Wolfertstetter, Ugne Klimiene, Kieran Chin-Cheong, Alyssia Paschke, Julia Zerres, Markus Denzinger, David Niederberger, Sven Wellmann, Ece Ozkan, Christian Knorr, Julia E. Vogt

Summary: This study presents interpretable machine learning models for predicting the diagnosis, management, and severity of suspected appendicitis using ultrasound images. The proposed models utilize concept bottleneck models (CBM) that facilitate interpretation and intervention by clinicians, without compromising performance or requiring time-consuming image annotation.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Residual Aligner-based Network (RAN) : Motion-separable structure for coarse-to-fine discontinuous deformable registration

Jian-Qing Zheng, Ziyang Wang, Baoru Huang, Ngee Han Lim, Bartlomiej W. Papiez

Summary: This article introduces a new method for medical image registration, which utilizes a separable motion backbone and a residual aligner module to better handle the discontinuous motion of multiple neighboring objects. The proposed method achieves excellent registration results on abdominal CT scans and lung CT scans.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

A knowledge-interpretable multi-task learning framework for automated thyroid nodule diagnosis in ultrasound videos

Xiangqiong Wu, Guanghua Tan, Hongxia Luo, Zhilun Chen, Bin Pu, Shengli Li, Kenli Li

Summary: This study develops a user-friendly framework for the automated diagnosis of thyroid nodules in ultrasound videos, simulating the diagnostic workflow of radiologists. By interpreting image characteristics and modeling temporal contextual information, the efficiency and generalizability of the diagnosis can be improved.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

DeepSSM: A blueprint for image-to-shape deep learning models

Riddhish Bhalodia, Shireen Elhabian, Jadie Adams, Wenzheng Tao, Ladislav Kavan, Ross Whitaker

Summary: This paper introduces DeepSSM, a deep learning-based framework for image-to-shape modeling. By learning the functional mapping from images to low-dimensional shape descriptors, DeepSSM can directly infer statistical representation of anatomy from 3D images. Compared to traditional methods, DeepSSM eliminates the need for heavy manual preprocessing and segmentation, and significantly improves computational time.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Automatic registration with continuous pose updates for marker-less surgical navigation in spine surgery

Florentin Liebmann, Marco von Atzigen, Dominik Stutz, Julian Wolf, Lukas Zingg, Daniel Suter, Nicola A. Cavalcanti, Laura Leoty, Hooman Esfandiari, Jess G. Snedeker, Martin R. Oswald, Marc Pollefeys, Mazda Farshad, Philipp Furnstahl

Summary: This study presents a marker-less approach for automatic registration and real-time navigation of lumbar spinal fusion surgery using a deep neural network, avoiding radiation exposure and surgical errors. The method was validated on an ex-vivo surgery and a public dataset.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Cycle consistent twin energy-based models for image-to-image translation

Piyush Tiwary, Kinjawl Bhattacharyya, A. P. Prathosh

Summary: Domain shift refers to the change of distributional characteristics between training and testing datasets, leading to performance drop. For medical image tasks, domain shift can be caused by changes in imaging modalities, devices, and staining mechanisms. Existing approaches based on generative models suffer from training difficulties and lack of diversity. In this paper, the authors propose the use of energy-based models (EBMs) for unpaired image-to-image translation in medical images. The proposed method, called Cycle Consistent Twin EBMs (CCT-EBM), employs a pair of EBMs in the latent space of an Auto-Encoder to ensure translation symmetry and coupling between domains.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

ReFs: A hybrid pre-training paradigm for 3D medical image segmentation

Yutong Xie, Jianpeng Zhang, Lingqiao Liu, Hu Wang, Yiwen Ye, Johan Verjans, Yong Xia

Summary: This paper proposes a hybrid pre-training paradigm that combines self-supervised learning and supervised learning to improve the representation quality for medical image segmentation tasks. It introduces a reference task in self-supervised learning and optimizes the model using a gradient matching method. The experimental results demonstrate the effectiveness of this approach on multiple medical image segmentation benchmarks.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Cell classification with worse-case boosting for intelligent cervical cancer screening

Youyi Song, Jing Zou, Kup-Sze Choi, Baiying Lei, Jing Qin

Summary: Cell classification is crucial for intelligent cervical cancer screening, but the variation in cells' appearance and shape poses challenges. A new learning algorithm, worse-case boosting, is proposed to improve classification accuracy for under-represented data. Experimental results demonstrate the effectiveness of this algorithm in two publicly available datasets, achieving a 4% improvement in accuracy.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, Jong Chul Ye

Summary: The increasing demand for AI systems to monitor human errors and abnormalities in healthcare presents challenges. This study presents a model called Medical X-VL, which is tailored for the medical domain and outperformed current state-of-the-art models in two medical image datasets. The model enables various zero-shot tasks for monitoring AI in the medical domain.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Signal domain adaptation network for limited-view optoacoustic tomography

Anna Klimovskaia Susmelj, Berkan Lafci, Firat Ozdemir, Neda Davoudi, Xose Luis Dean-Ben, Fernando Perez-Cruz, Daniel Razansky

Summary: Optoacoustic imaging is a technique that uses optical excitation and ultrasound detection for biological tissue imaging. The quality of the images depends on the extent of tomographic coverage provided by the ultrasound detector arrays. However, full coverage is not always possible due to experimental constraints. The proposed signal domain adaptation network aims to reduce limited-view artifacts in the images.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

SynCLay: Interactive synthesis of histology images from bespoke cellular layouts

Srijay Deshpande, Muhammad Dawood, Fayyaz Minhas, Nasir Rajpoot

Summary: In this work, a novel framework called SynCLay is proposed for automated synthesis of histology images based on user-defined cellular layouts. The framework can generate realistic and high-quality histology images with different cellular arrangements, which is helpful for studying the role of cells in the tumor microenvironment. The framework integrates a nuclear segmentation and classification model to refine nuclear structures and generate nuclear masks. Evaluation using quantitative metrics and feedback from pathologists shows that the synthetic images generated by SynCLay have high realism scores and can accurately differentiate between benign and malignant tumors.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

CenTime: Event-conditional modelling of censoring in survival analysis

Ahmed H. Shahin, An Zhao, Alexander C. Whitehead, Daniel C. Alexander, Joseph Jacob, David Barber

Summary: Survival analysis is a valuable tool in healthcare for predicting the time to specific events. This paper introduces CenTime, a novel approach that directly estimates the time to event. The method performs well with censored data and can be easily integrated with deep learning models. Compared to standard methods, CenTime offers superior performance in predicting event time while maintaining comparable ranking performance.

MEDICAL IMAGE ANALYSIS (2024)

Article Computer Science, Artificial Intelligence

Do we really need dice? The hidden region-size biases of segmentation losses

Bingyuan Liu, Jose Dolz, Adrian Galdran, Riadh Kobbi, Ismail Ben Ayed

Summary: Most segmentation losses, such as CE and Dice, are variants of the Cross-Entropy or Dice losses. This work provides a theoretical analysis that shows a deeper connection between CE and Dice than previously thought. From a constrained-optimization perspective, both CE and Dice decompose into similar ground-truth matching terms and region-size penalty terms. The analysis uncovers hidden region-size biases: Dice has an intrinsic bias towards extremely imbalanced solutions, while CE implicitly encourages the ground-truth region proportions. Based on this analysis, a principled and simple solution is proposed to explicitly control the region-size bias.

MEDICAL IMAGE ANALYSIS (2024)