4.6 Article

A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer

期刊

DIAGNOSTICS
卷 9, 期 4, 页码 -

出版社

MDPI
DOI: 10.3390/diagnostics9040219

关键词

supervised learning; next generation sequencing; classification; transcriptomics; Gleason score detection; prostate cancer

资金

  1. Natural Sciences and Engineering Research Council of Canada (NSERC)

向作者/读者索取更多资源

(1) Background:One of the most common cancers that affect North American men and men worldwide is prostate cancer. The Gleason score is a pathological grading system to examine the potential aggressiveness of the disease in the prostate tissue. Advancements in computing and next-generation sequencing technology now allow us to study the genomic profiles of patients in association with their different Gleason scores more accurately and effectively. (2) Methods: In this study, we used a novel machine learning method to analyse gene expression of prostate tumours with different Gleason scores, and identify potential genetic biomarkers for each Gleason group. We obtained a publicly-available RNA-Seq dataset of a cohort of 104 prostate cancer patients from the National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO) repository, and categorised patients based on their Gleason scores to create a hierarchy of disease progression. A hierarchical model with standard classifiers in different Gleason groups, also known as nodes, was developed to identify and predict nodes based on their mRNA or gene expression. In each node, patient samples were analysed via class imbalance and hybrid feature selection techniques to build the prediction model. The outcome from analysis of each node was a set of genes that could differentiate each Gleason group from the remaining groups. To validate the proposed method, the set of identified genes were used to classify a second dataset of 499 prostate cancer patients collected from cBioportal. (3) Results: The overall accuracy of applying this novel method to the first dataset was 93.3%; the method was further validated to have 87% accuracy using the second dataset. This method also identified genes that were not previously reported as potential biomarkers for specific Gleason groups. In particular, PIAS3 was identified as a potential biomarker for Gleason score 4 + 3 = 7, and UBE2V2 for Gleason score 6. (4) Insight: Previous reports show that the genes predicted by this newly proposed method strongly correlate with prostate cancer development and progression. Furthermore, pathway analysis shows that both PIAS3 and UBE2V2 share similar protein interaction pathways, the JAK/STAT signaling process.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据