☆ 4.8 Article

Genome-wide prediction of disease variant effects with a deep protein language model

NATURE GENETICS (2023)

期刊

NATURE GENETICS

卷 -, 期 -, 页码 -

出版社

NATURE PORTFOLIO

DOI: 10.1038/s41588-023-01465-0

关键词

类别

Genetics & Heredity

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

By developing a workflow using the ESM1b protein language model, we can predict the effects of approximately 450 million possible missense variants in the human genome. ESM1b outperforms existing methods in classifying ClinVar/HGMD missense variants and predicting measurements across multiple datasets. The approach also considers specific protein isoforms and can be applied to more complex coding variants.

Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all similar to 450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying similar to 150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated similar to 2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects.

Genome-wide prediction of disease variant effects with a deep protein language model

期刊

NATURE GENETICS

出版社

NATURE PORTFOLIO

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Genome-wide prediction of disease variant effects with a deep protein language model

期刊

NATURE GENETICS

出版社

NATURE PORTFOLIO

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文