4.5 Article

Correcting the Optimal Resampling-Based Error Rate by Estimating the Error Rate of Wrapper Algorithms

Journal

BIOMETRICS
Volume 69, Issue 3, Pages 693-702

Publisher

WILEY
DOI: 10.1111/biom.12041

Keywords

Classification; High-dimensional data; Method selection bias; Repeated subsampling; Tuning bias

Funding

  1. LMU-innovativ Project BioMed-S
  2. German Research Foundation (DFG) [BO3139/2-1, BO3139/2-2]

Ask authors/readers for more resources

High-dimensional binary classification tasks, for example, the classification of microarray samples into normal and cancer tissues, usually involve a tuning parameter. By reporting the performance of the best tuning parameter value only, over-optimistic prediction errors are obtained. For correcting this tuning bias, we develop a new method which is based on a decomposition of the unconditional error rate involving the tuning procedure, that is, we estimate the error rate of wrapper algorithms as introduced in the context of internal cross-validation (ICV) by Varma and Simon (2006, BMC Bioinformatics 7, 91). Our subsampling-based estimator can be written as a weighted mean of the errors obtained using the different tuning parameter values, and thus can be interpreted as a smooth version of ICV, which is the standard approach for avoiding tuning bias. In contrast to ICV, our method guarantees intuitive bounds for the corrected error. Additionally, we suggest to use bias correction methods also to address the conceptually similar method selection bias that results from the optimal choice of the classification method itself when evaluating several methods successively. We demonstrate the performance of our method on microarray and simulated data and compare it to ICV. This study suggests that our approach yields competitive estimates at a much lower computational price.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Mathematical & Computational Biology

Sampling uncertainty versus method uncertainty: A general framework with applications to omics biomarker selection

Simon Klau, Marie-Laure Martin-Magniette, Anne-Laure Boulesteix, Sabine Hoffmann

BIOMETRICAL JOURNAL (2020)

Review Genetics & Heredity

Statistical learning approaches in the genetic epidemiology of complex diseases

Anne-Laure Boulesteix, Marvin N. Wright, Sabine Hoffmann, Inke R. Koenig

HUMAN GENETICS (2020)

Article Health Care Sciences & Services

A plea for taking all available clinical information into account when assessing the predictive value of omics data

Alexander Volkmann, Riccardo De Bin, Willi Sauerbrei, Anne-Laure Boulesteix

BMC MEDICAL RESEARCH METHODOLOGY (2019)

Review Biochemical Research Methods

Combining clinical and molecular data in regression prediction models: insights from a simulation study

Riccardo De Bin, Anne-Laure Boulesteix, Axel Benner, Natalia Becker, Willi Sauerbrei

BRIEFINGS IN BIOINFORMATICS (2020)

Article Obstetrics & Gynecology

Delivery room desaturations and bradycardia in the early postnatal period of healthy term neonates - a prospective observational study

D. -M Burgmann, K. Foerster, M. Klemme, M. Delius, C. Huebener, R. Wiskott, A. L. Boulesteix, A. W. Flemmer

Summary: This study evaluated the frequency, duration, and severity of desaturations and bradycardia in the first hours of life in term neonates. The results showed that approximately 30% of infants experienced desaturations, with 25% of them being prolonged desaturations. Infants born by planned Cesarean section had a significantly higher occurrence of desaturations compared to other modes of delivery.

JOURNAL OF MATERNAL-FETAL & NEONATAL MEDICINE (2022)

Article Oncology

Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

Daniel Samaga, Roman Hornung, Herbert Braselmann, Julia Hess, Horst Zitzelsberger, Claus Belka, Anne-Laure Boulesteix, Kristian Unger

RADIATION ONCOLOGY (2020)

Article Mathematics, Interdisciplinary Applications

Improved Outcome Prediction Across Data Sources Through Robust Parameter Tuning

Nicole Ellenbach, Anne-Laure Boulesteix, Bernd Bischl, Kristian Unger, Roman Hornung

Summary: The paper discusses the reasons why prediction rules trained on high-dimensional data do not generalize well across different sources, introduces a new method for tuning parameter selection, and concludes through a large-scale comparison study that tuning on external data and robust tuning with a tuned robustness parameter lead to better generalizing prediction rules.

JOURNAL OF CLASSIFICATION (2021)

Article Statistics & Probability

On the asymptotic behaviour of the variance estimator of a U-statistic

Mathias Fuchs, Roman Hornung, Anne-Laure Boulesteix, Riccardo De Bin

JOURNAL OF STATISTICAL PLANNING AND INFERENCE (2020)

Article Statistics & Probability

Adapted single-cell consensus clustering (adaSC3)

Cornelia Fuetterer, Thomas Augustin, Christiane Fuchs

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2020)

Article Medicine, General & Internal

Introduction to statistical simulations in health research

Anne-Laure Boulesteix, Rolf H. H. Groenwold, Michal Abrahamowicz, Harald Binder, Matthias Briel, Roman Hornung, Tim P. Morris, Jorg Rahnenfuhrer, Willi Sauerbrei

BMJ OPEN (2020)

Article Multidisciplinary Sciences

The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines

Sabine Hoffmann, Felix Schoenbrodt, Ralf Elsas, Rory Wilson, Ulrich Strasser, Anne-Laure Boulesteix

Summary: This paper presents a general framework on sources of uncertainty in computational analyses that lead to multiplicity of analysis strategies, and applies it to various approaches proposed in different disciplines to address this issue.

ROYAL SOCIETY OPEN SCIENCE (2021)

Article Computer Science, Artificial Intelligence

Information efficient learning of complexly structured preferences: Elicitation procedures and their application to decision making under uncertainty

C. Jansen, H. Blocher, T. Augustin, G. Schollmeyer

Summary: This paper proposes efficient methods for eliciting complex preferences and applies them to decision making problems. The methods enable decision makers to reveal their preference system through as few ranking questions as possible. The study presents two approaches, one utilizing ranking data to obtain ordinal preferences and the other explicitly eliciting an approximate version of the cardinal preferences. Conditions for obtaining the decision maker's true preference system and improving efficiency are discussed.

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING (2022)

Article Biotechnology & Applied Microbiology

On the optimistic performance evaluation of newly introduced bioinformatic methods

Stefan Buchka, Alexander Hapfelmeier, Paul P. Gardner, Rory Wilson, Anne-Laure Boulesteix

Summary: Many research articles claim that new data analysis methods outperform existing ones, but the veracity of such claims is questionable. This manuscript discusses the consequences of optimistic bias in evaluating novel data analysis methods, and quantitatively investigates this bias using an example from epigenetic analysis.

GENOME BIOLOGY (2021)

Article Public, Environmental & Occupational Health

Examining the robustness of observational associations to model, measurement and sampling uncertainty with the vibration of effects framework

Simon Klau, Sabine Hoffmann, Chirag J. Patel, John P. A. Ioannidis, Anne-Laure Boulesteix

Summary: The study highlights the significant impact of sampling, model, and measurement uncertainty on the stability of observational associations, potentially leading to large variability in results. Measurement error in observational studies can attenuate the true effect in most cases, but may also occasionally result in overestimation.

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY (2021)

Article Cardiac & Cardiovascular Systems

Outcome of patients treated with extracorporeal life support in cardiogenic shock complicating acute myocardial infarction: 1-year result from the ECLS-Shock study

Korbinian Lackermair, Stefan Brunner, Mathias Orban, Sven Peterss, Martin Orban, Hans D. Theiss, Bruno C. Huber, Gerd Juchem, Frank Born, Anne-Laure Boulesteix, Axel Bauer, Maximilian Pichlmaier, Joerg Hausleiter, Steffen Massberg, Christian Hagl, Sabina P. W. Guenther

Summary: This pilot study showed that randomized studies with ECLS in CS patients are feasible and safe. Small numbers of included patients impede meaningful conclusions about mortality and neurological outcome. Our findings of numerical differences in mortality and survival with severe neurological impairment give an urgent call for larger multi-centric randomized trials assessing the endpoint of all-cause mortality but also considering the effects on neurological outcome measures.

CLINICAL RESEARCH IN CARDIOLOGY (2021)

No Data Available