☆ 4.1 Article

Overfitting in prediction models - Is it a problem only in high dimensions?

CONTEMPORARY CLINICAL TRIALS (2013)

Journal

CONTEMPORARY CLINICAL TRIALS

Volume 36, Issue 2, Pages 636-641

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.cct.2013.06.011

Keywords

Classifiers; Prediction accuracy; Overfitting; Clinical trials; Patient selection

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The growing recognition that human diseases are molecularly heterogeneous has stimulated interest in the development of prognostic and predictive classifiers for patient selection and stratification. In the process of classifier development, it has been repeatedly emphasized that in situations where the number of candidate predictor variables is much larger than the number of observations, the apparent (training set, resubstitution) accuracy of the classifiers can be highly optimistically biased and hence, classification accuracy should be reported based on evaluation of the classifier on a separate test set or using complete cross-validation. Such evaluation methods have however not been the norm in the case of low-dimensional, p < n data that arise, for example, in clinical trials when a classifier is developed on a combination of clinico-pathological variables and a small number of genetic biomarkers selected from an understanding of the biology of the disease. We undertook simulation studies to investigate the existence and extent of the problem of overfitting with low-dimensional data. The results indicate that overfitting can be a serious problem even for low-dimensional data, especially if the relationship of outcome to the set of predictor variables is not strong. We hence encourage the adoption of either a separate test set or complete cross-validation to evaluate classifier accuracy, even when the number of candidate predictor variables is substantially smaller than the number of cases. Published by Elsevier Inc.

Overfitting in prediction models - Is it a problem only in high dimensions?

Journal

CONTEMPORARY CLINICAL TRIALS

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Overfitting in prediction models - Is it a problem only in high dimensions?

Journal

CONTEMPORARY CLINICAL TRIALS

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper