4.5 Article

Subtype classification and heterogeneous prognosis model construction in precision medicine

Journal

BIOMETRICS
Volume 74, Issue 3, Pages 814-822

Publisher

WILEY
DOI: 10.1111/biom.12843

Keywords

EM algorithm; Finite-mixture Cox proportional hazards model; Heterogeneity; High-dimensional data; Subtype; Variable selection

Funding

  1. National Natural Science Foundation of China [11671409, 11301554, 11771462]
  2. Natural Science Foundation of Guangdong, China [2015A030313143, 2016B050502007]
  3. free application projects from the SYSU-CMU Shunde International Joint Research Institute
  4. U.S. National Institute on Drug Abuse [R01 DA016750]
  5. Chinese thousand talents scholarship

Ask authors/readers for more resources

Common diseases including cancer are heterogeneous. It is important to discover disease subtypes and identify both shared and unique risk factors for different disease subtypes. The advent of high-throughput technologies enriches the data to achieve this goal, if necessary statistical methods are developed. Existing methods can accommodate both heterogeneity identification and variable selection under parametric models, but for survival analysis, the commonly used Cox model is semiparametric. Although finite-mixture Cox model has been proposed to address heterogeneity in survival analysis, variable selection has not been incorporated into such semiparametric models. Using regularization regression, we propose a variable selection method for the finite-mixture Cox model and select important, subtype-specific risk factors from high-dimensional predictors. Our estimators have oracle properties with proper choices of penalty parameters under the regularization regression. An expectation-maximization algorithm is developed for numerical calculation. Simulations demonstrate that our proposed method performs well in revealing the heterogeneity and selecting important risk factors for each subtype, and its performance is compared to alternatives with other regularizers. Finally, we apply our method to analyze a gene expression dataset for ovarian cancer DNA repair pathways. Based on our selected risk factors, the prognosis model accounting for heterogeneity consistently improves the prediction for the survival probability in both training and test datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available