4.8 Article

Topological representations of crystalline compounds for the machine-learning prediction of materials properties

期刊

NPJ COMPUTATIONAL MATERIALS
卷 7, 期 1, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41524-021-00493-w

关键词

-

资金

  1. Soft Science Research Project of Guangdong Province [2017B030301013]
  2. National Key R&D Program of China [2016YFB0700600]
  3. Shenzhen Science and Technology Research Grant [ZDSYS201707281026184]
  4. NSF [DMS1721024, DMS1761320, IIS1900473]
  5. NIH [GM126189, GM129004]
  6. Bristol-Myers Squibb
  7. Pfizer

向作者/读者索取更多资源

Accurate theoretical predictions of desired properties of materials play an important role in materials research and development. Machine learning can accelerate the materials design by building a model from input data. For complex datasets, an algebraic topology-based method called ASPH is introduced as a unique representation of crystal structures, providing highly accurate prediction of formation energy.
Accurate theoretical predictions of desired properties of materials play an important role in materials research and development. Machine learning (ML) can accelerate the materials design by building a model from input data. For complex datasets, such as those of crystalline compounds, a vital issue is how to construct low-dimensional representations for input crystal structures with chemical insights. In this work, we introduce an algebraic topology-based method, called atom-specific persistent homology (ASPH), as a unique representation of crystal structures. The ASPH can capture both pairwise and many-body interactions and reveal the topology-property relationship of a group of atoms at various scales. Combined with composition-based attributes, ASPH-based ML model provides a highly accurate prediction of the formation energy calculated by density functional theory (DFT). After training with more than 30,000 different structure types and compositions, our model achieves a mean absolute error of 61 meV/atom in cross-validation, which outperforms previous work such as Voronoi tessellations and Coulomb matrix method using the same ML algorithm and datasets. Our results indicate that the proposed topology-based method provides a powerful computational tool for predicting materials properties compared to previous works.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Chemistry, Multidisciplinary

Modeling the Effects of Calcium Overload on Mitochondrial Ultrastructural Remodeling

Jasiel O. Strubbe-Rivera, Jiahui Chen, Benjamin A. West, Kristin N. Parent, Guo-Wei Wei, Jason N. Bazil

Summary: Mitochondrial cristae are dynamic invaginations of the inner membrane that play a key role in ATP production. Structural alterations caused by genetic abnormalities or detrimental environmental factors can reduce mitochondrial metabolic capacity. A computational strategy was proposed to understand how cristae are formed and how calcium phosphate granules affect mitochondrial energy metabolism.

APPLIED SCIENCES-BASEL (2021)

Article Biology

Analysis of SARS-CoV-2 mutations in the United States suggests presence of four substrains and novel variants

Rui Wang, Jiahui Chen, Kaifu Gao, Yuta Hozumi, Changchuan Yin, Guo-Wei Wei

Summary: The study reveals the presence of four sub-strains and eleven top mutations in the United States, with five and eight concurrent mutations prevailing in two groups, while another group with three concurrent mutations gradually fading out. Additionally, it is found that female immune systems are more active than those of males in responding to SARS-CoV-2 infections.

COMMUNICATIONS BIOLOGY (2021)

Article Biochemistry & Molecular Biology

Revealing the Threat of Emerging SARS-CoV-2 Mutations to Antibody Therapies

Jiahui Chen, Kaifu Gao, Rui Wang, Guo-Wei Wei

Summary: The ongoing vaccination and development of intervention offer hope to end the global COVID-19 pandemic, but emerging SARS-CoV-2 variants could compromise existing vaccines and antibody therapies. Studies on potential threats from mutations are limited, and the impact on clinical trial antibodies is largely unknown.

JOURNAL OF MOLECULAR BIOLOGY (2021)

Article Multidisciplinary Sciences

Algebraic graph-assisted bidirectional transformers for molecular property prediction

Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, Feng Pan

Summary: Researchers proposed an algebraic graph-assisted bidirectional transformer framework, which can integrate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy and incorporate 3D stereochemical information from graphs, showing state-of-the-art performance in molecular property prediction.

NATURE COMMUNICATIONS (2021)

Article Biochemistry & Molecular Biology

Charge substitutions at the voltage-sensing module of domain III enhance actions of site-3 and site-4 toxins on an insect sodium channel

Qing Zhu, Yuzhe Du, Yoshiko Nomura, Rong Gao, Zixuan Cang, Guo-Wei Wei, Dalia Gordon, Michael Gurevitz, James Groome, Ke Dong

Summary: The study indicates that charge substitutions in different structural domains of the sodium channel can enhance the activity of scorpion toxins, particularly the charge reversal substitutions in the voltage-sensing modules of domain III which can facilitate the actions of toxins on IIS4 or IVS4 voltage sensors.

INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY (2021)

Article Biology

Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants

Jiahui Chen, Yuchi Qiu, Rui Wang, Guo-Wei Wei

Summary: Due to its high transmissibility, Omicron BA.1 became the dominant variant in late 2021, replacing the Delta variant, and was later replaced by the even more transmissible Omicron BA.2. This study tackles the challenge of capturing both topological change and homotopic shape evolution in virus-human protein-protein binding using persistent Laplacian-based deep learning models. The analysis reveals that BA.4 and BA.5 are more infectious than BA.2 and are projected to become new dominant variants. Additionally, the proposed models outperform state-of-the-art methods in predicting mutation-induced protein-protein binding free energy changes.

COMPUTERS IN BIOLOGY AND MEDICINE (2022)

Article Biology

Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models

Hongsong Feng, Guo-Wei Wei

Summary: In this study, machine learning-based in silico tools were used to screen compounds in the DrugBank database. It was found that 227 out of 8641 DrugBank compounds potentially block the hERG channel, which may lead to serious drug safety issues.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Biology

Machine-learning repurposing of DrugBank compounds for opioid use disorder

Hongsong Feng, Jian Jiang, Guo-Wei Wei

Summary: Opioid use disorder (OUD) is a chronic and relapsing condition characterized by continued and compulsive use of opioids despite harmful consequences. Drug repurposing using machine learning is an efficient and cost-effective approach for discovering medications for OUD treatment.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Biology

Machine-learning analysis of opioid use disorder informed by MOR, DOR, KOR, NOR and ZOR-based interactome networks

Hongsong Feng, Rana Elladki, Jian Jiang, Guo-Wei Wei

Summary: Opioid use disorder (OUD) is a global public health issue, and the efficacy of current treatment options needs to be improved. This study utilized machine learning and protein-protein interaction networks to explore potential drug candidates for OUD treatment. The findings provide valuable insights and promising tools for the development of pharmacological treatments.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Chemistry, Multidisciplinary

Persistent Topological Laplacian Analysis of SARS-CoV-2 Variants

Xiaoqi Wei, Jiahui Chen, Guo-Wei Wei

Summary: Persistent topological Laplacians (PTLs) are a new tool in topological data analysis for studying protein structural changes. By using PTLs, we can reveal the spectrum changes in protein structures among SARS-CoV-2 variants and analyze the structural changes induced by RBD and ACE2 binding. Furthermore, PTLs can be utilized in a topological deep learning paradigm and for predictions of deep mutational scanning datasets for SARS-CoV-2 variants.

JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY (2023)

Review Biochemical Research Methods

Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models

Yuchi Qiu, Guo-Wei Wei

Summary: Protein engineering is a promising field in biotechnology with the potential to revolutionize various areas. Machine learning models, particularly those based on natural language processing, have greatly accelerated protein engineering by leveraging protein databases. Advances in topological data analysis and artificial intelligence-based protein structure prediction have enabled more powerful structure-based machine learning-assisted protein engineering strategies. This review provides a comprehensive and indispensable set of methodological components, including topological data analysis and natural language processing, to facilitate the future development of protein engineering.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biology

Topological deep learning based deep mutational scanning

Jiahui Chen, Daniel R. Woldring, Faqing Huang, Xuefei Huang, Guo-Wei Wei

Summary: High-throughput deep mutational scanning (DMS) experiments have revolutionized various fields such as protein engineering, drug discovery, immunology, cancer biology, and evolutionary biology by providing systematic understanding of protein functions. However, the enormous mutational space associated with proteins exceeds current experimental capabilities, necessitating alternative approaches for DMS. In this study, we propose a topological deep learning (TDL) paradigm that utilizes a new topological data analysis (TDA) technique based on the persistent spectral theory. Our results demonstrate the accuracy and reliability of the TDL-DMS model in predicting binding interface mutations using SARS-CoV-2 datasets. This finding has significant implications for SARS-CoV-2 variant forecasting, antibody design, vaccine development, precision medicine, and protein engineering.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Chemistry, Multidisciplinary

Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies

Jiahui Chen, Kaifu Gao, Rui Wang, Guo-Wei Wei

Summary: This study examines the impact of mutations on the spike protein of COVID-19, particularly on vaccines and antibody therapies. The research findings reveal that certain mutations may weaken the binding between the spike protein and antibodies, potentially reducing the efficacy of current treatments. Moreover, it is discovered that some mutations could enhance the binding between the spike protein and human angiotensin-converting enzyme 2 (ACE2), leading to more infectious variants of the virus.

CHEMICAL SCIENCE (2021)

Review Chemistry, Physical

A review of mathematical representations of biomolecular data

Duc Duy Nguyen, Zixuan Cang, Guo-Wei Wei

PHYSICAL CHEMISTRY CHEMICAL PHYSICS (2020)

暂无数据