4.6 Article

Semi-supervised learning to improve generalizability of risk prediction models

Journal

JOURNAL OF BIOMEDICAL INFORMATICS
Volume 92, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2019.103117

Keywords

Generalizability; Clinical usefulness; Colorectal cancer (CRC); External validation; Prediction model; Semi-supervised learning (SSL)

Funding

  1. National Natural Science Foundation of China [81771936, 81801796, 81672916]
  2. National Key Research and Development Program of China [2016YFF0103200, 2017YFC0908200]
  3. Major Scientific Project of Zhejiang Lab [2018DG0ZX01]

Ask authors/readers for more resources

The utility of a prediction model depends on its generalizability to patients drawn from different but related populations. We explored whether a semi-supervised learning model could improve the generalizability of colorectal cancer (CRC) risk prediction relative to supervised learning methods. Data on 113,141 patients diagnosed with nonmetastatic CRC from 2004 to 2012 were obtained from the Surveillance Epidemiology End Results registry for model development, and data on 1149 patients from the Second Affiliated Hospital, Zhejiang University School of Medicine, who were diagnosed between 2004 and 2011, were collected for generalizability testing. A clinical prediction model for CRC survival risk using a semi-supervised logistic regression method was developed and validated to investigate the model discrimination, calibration, generalizability, interpretability and clinical usefulness. Rigorous model performance comparisons with other supervised learning models were performed. The area under the curve of the logistic membership model revealed a large heterogeneity between the development cohort and validation cohort, which is typical of generalizability studies of prediction models. The discrimination was good for all models. Calibration was poor for supervised learning models, while the semi-supervised logistic regression model exhibited a good calibration on the validation cohort, which indicated good generalizability. Clinical usefulness analysis showed that semi-supervised logistic regression can lead to better clinical outcomes than supervised learning methods. These results increase our current understanding of the generalizability of different models and provide a reference for predictive model development for clinical decision-making.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available