4.7 Article

Prediction and interpretation of cancer survival using graph convolution neural networks

Journal

METHODS
Volume 192, Issue -, Pages 120-130

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.ymeth.2021.01.004

Keywords

Survival analysis; Graph convolutional neural network; The cancer genome atlas (TCGA)

Funding

  1. NCI Cancer Center Shared Resources [NIH-NCI P30CA054174]
  2. NIH [CTSA UL1TR002645, R01GM113245, K99CA248944]
  3. CPRIT [RP160732, RP190346]
  4. San Antonio Life Science Institute (SALSI Postdoctoral Research Fellowship 2018)
  5. Fund for Innovation in Cancer Informatics (ICI Fund)

Ask authors/readers for more resources

The overall survival rates for breast, prostate, testicular, and colon cancer have significantly increased over the past two decades, while brain and pancreatic cancers have shown little improvement. A novel graph convolution neural network (GCNN) approach called Surv_GCNN has been developed to predict survival rates for 13 different cancer types using the TCGA dataset. The Surv_GCNN models with clinical data outperformed other models in predicting risk scores for 7 out of 13 cancer types, showing better performance overall.
The survival rate of cancer has increased significantly during the past two decades for breast, prostate, testicular, and colon cancer, while the brain and pancreatic cancers have a much lower median survival rate that has not improved much over the last forty years. This has imposed the challenge of finding gene markers for early cancer detection and treatment strategies. Different methods including regression-based Cox-PH, artificial neural networks, and recently deep learning algorithms have been proposed to predict the survival rate for cancers. We established in this work a novel graph convolution neural network (GCNN) approach called Surv_GCNN to predict the survival rate for 13 different cancer types using the TCGA dataset. For each cancer type, 6 Surv_GCNN models with graphs generated by correlation analysis, GeneMania database, and correlation + GeneMania were trained with and without clinical data to predict the risk score (RS). The performance of the 6 Surv_GCNN models was compared with two other existing models, Cox-PH and Cox-nnet. The results showed that Cox-PH has the worst performance among 8 tested models across the 13 cancer types while Surv_GCNN models with clinical data reported the best overall performance, outperforming other competing models in 7 out of 13 cancer types including BLCA, BRCA, COAD, LUSC, SARC, STAD, and UCEC. A novel network-based interpretation of Surv_GCNN was also proposed to identify potential gene markers for breast cancer. The signatures learned by the nodes in the hidden layer of Surv_GCNN were identified and were linked to potential gene markers by network modularization. The identified gene markers for breast cancer have been compared to a total of 213 gene markers from three widely cited lists for breast cancer survival analysis. About 57% of gene markers obtained by Surv_GCNN with correlation + GeneMania graph either overlap or directly interact with the 213 genes, confirming the effectiveness of the identified markers by Surv_GCNN.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available