4.6 Article

Uncovering and Correcting Shortcut Learning in Machine Learning Models for Skin Cancer Diagnosis

期刊

DIAGNOSTICS
卷 12, 期 1, 页码 -

出版社

MDPI
DOI: 10.3390/diagnostics12010040

关键词

deep learning; explainable AI; skin cancer diagnosis; inpainting; shortcut learning; model bias; confounding

向作者/读者索取更多资源

This study focuses on the issues of deep learning models in skin cancer diagnosis. It demonstrates through experiments that shortcut learning in trained classifiers can lead to unreliable predictions in clinical practice. A method is proposed to eliminate shortcut learning and make the classifier more accurate and reliable.
Machine learning models have been successfully applied for analysis of skin images. However, due to the black box nature of such deep learning models, it is difficult to understand their underlying reasoning. This prevents a human from validating whether the model is right for the right reasons. Spurious correlations and other biases in data can cause a model to base its predictions on such artefacts rather than on the true relevant information. These learned shortcuts can in turn cause incorrect performance estimates and can result in unexpected outcomes when the model is applied in clinical practice. This study presents a method to detect and quantify this shortcut learning in trained classifiers for skin cancer diagnosis, since it is known that dermoscopy images can contain artefacts. Specifically, we train a standard VGG16-based skin cancer classifier on the public ISIC dataset, for which colour calibration charts (elliptical, coloured patches) occur only in benign images and not in malignant ones. Our methodology artificially inserts those patches and uses inpainting to automatically remove patches from images to assess the changes in predictions. We find that our standard classifier partly bases its predictions of benign images on the presence of such a coloured patch. More importantly, by artificially inserting coloured patches into malignant images, we show that shortcut learning results in a significant increase in misdiagnoses, making the classifier unreliable when used in clinical practice. With our results, we, therefore, want to increase awareness of the risks of using black box machine learning models trained on potentially biased datasets. Finally, we present a model-agnostic method to neutralise shortcut learning by removing the bias in the training dataset by exchanging coloured patches with benign skin tissue using image inpainting and re-training the classifier on this de-biased dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Medicine, General & Internal

A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer

Ricky Walsh, Mickael Tardy

Summary: Tools based on deep learning models have been developed to assist radiologists in diagnosing breast cancer from mammograms. However, the imbalance of malignant and benign samples in the training datasets can lead to biased models. This study evaluates different techniques to address this class imbalance issue and shows that they can counteract the bias towards the majority class. However, these techniques do not improve the model's performance in terms of AUC-ROC, except for the synthetic lesion generation approach.

DIAGNOSTICS (2023)

Review Computer Science, Theory & Methods

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Meike Nauta, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yasmin Schmitt, Joerg Schloetterer, Maurice Van Keulen, Christin Seifert

Summary: The evaluation of explanations for machine learning models is a complex concept that should not be solely based on subjective validation. This study identifies 12 conceptual properties that should be considered for a comprehensive assessment of explanation quality. The evaluation practices of over 300 papers introducing explainable artificial intelligence (XAI) methods in the past 7 years were systematically reviewed, finding that one-third of the papers exclusively relied on anecdotal evidence and one-fifth evaluated with users. The study also provides an extensive overview of quantitative XAI evaluation methods, offering researchers and practitioners concrete tools for validation and benchmarking.

ACM COMPUTING SURVEYS (2023)

Proceedings Paper Optics

Evaluating CNN Interpretabilty on Sketch Classification

Abraham Theodorus, Meike Nauta, Christin Seifert

TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019) (2020)

Article Computer Science, Artificial Intelligence

Causal Discovery with Attention-Based Convolutional Neural Networks

Meike Nauta, Doina Bucur, Christin Seifert

MACHINE LEARNING AND KNOWLEDGE EXTRACTION (2019)

暂无数据