期刊
JOURNAL OF PROTEOME RESEARCH
卷 7, 期 9, 页码 3708-3717出版社
AMER CHEMICAL SOC
DOI: 10.1021/pr700859x
关键词
classification; supervised learning; regression; random forest; peptide identification
资金
- German Academic Exchange Service
- Hans L. Merkle foundation
- Robert Bosch GmbH
- DFG [HA-4364/2]
- Children's Hospital Trust
Protein identification by tandem mass spectrometry is based on the reliable processing of the acquired data. Unfortunately, the generation of a large number of poor quality spectra is commonly observed in LC-MS/MS, and the processing of these mostly noninformative spectra with its associated costs should be avoided. We present a continuous quality score that can be computed very quickly and that can be considered an approximation of the MASCOT score in case of a correct identification. This score can be used to reject low quality spectra prior to database identification, or to draw attention to those spectra that exhibit a (supposedly) high information content, but could not be identified. The proposed quality score can be calibrated automatically on site without the need for a manually generated training set. When this score is turned into a classifier and when features are used that are independent of the instrument, the proposed approach performs equally to previously published classifiers and feature sets and also gives insights into the behavior of the MASCOT score.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据