4.8 Article

CCS Predictor 2.0: An Open-Source Jupyter Notebook Tool for Filtering Out False Positives in Metabolomics

期刊

ANALYTICAL CHEMISTRY
卷 94, 期 50, 页码 17456-17466

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.analchem.2c03491

关键词

-

资金

  1. NIH
  2. [1U2CES030167-01]
  3. [1R01CA218664-01]

向作者/读者索取更多资源

Metabolite annotation is a bottleneck in nontargeted metabolomics workflows. High-resolution mass spectrometry, isotope cluster evaluation, Kendrick mass defect analysis, chromatographic retention time matching, and gas-phase collision cross-section measurements are commonly used methods for metabolite annotation. In this study, a machine learning algorithm called CCSP 2.0 is developed to predict CCS values, which improves the accuracy of metabolite annotation.
Metabolite annotation continues to be the widely accepted bottleneck in nontargeted metabolomics workflows. Annotation of metabolites typically relies on a combination of high-resolution mass spectrometry (MS) with parent and tandem measurements, isotope cluster evaluations, and Kendrick mass defect (KMD) analysis. Chromatographic retention time matching with standards is often used at the later stages of the process, which can also be followed by metabolite isolation and structure confirmation utilizing nuclear magnetic resonance (NMR) spectroscopy. The measurement of gas-phase collision cross-section (CCS) values by ion mobility (IM) spectrometry also adds an important dimension to this workflow by generating an additional molecular parameter that can be used for filtering unlikely structures. The millisecond timescale of IM spectrometry allows the rapid measurement of CCS values and allows easy pairing with existing MS workflows. Here, we report on a highly accurate machine learning algorithm (CCSP 2.0) in an open-source Jupyter Notebook format to predict CCS values based on linear support vector regression models. This tool allows customization of the training set to the needs of the user, enabling the production of models for new adducts or previously unexplored molecular classes. CCSP produces predictions with accuracy equal to or greater than existing machine learning approaches such as CCSbase, DeepCCS, and AI1CCS, while being better aligned with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Another unique aspect of CCSP 2.0 is its inclusion of a large library of 1613 molecular descriptors via the Mordred Python package, further encoding the fine aspects of isomeric molecular structures. CCS prediction accuracy was tested using CCS values in the McLean CCS Compendium with median relative errors of 1.25, 1.73, and 1.87% for the 170 [M - H](-), 155 [M + I](+), and 138 [M + Na](+) adducts tested. For superdass-matched data sets, CCS predictions via CCSP allowed filtering of 36.1% of incorrect structures while retaining a total of 100% of the correct annotations using a Delta(CCS) threshold of 2.8% and a mass error of 10 ppm.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据