☆ 4.5 Article

Information-theoretic approaches to SVM feature selection for metagenome read classification

COMPUTATIONAL BIOLOGY AND CHEMISTRY (2011)

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY

卷 35, 期 3, 页码 199-209

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.compbiolchem.2011.04.007

关键词

Metagenomics; Information theory; Support vector machines

类别

Biology Computer Science, Interdisciplinary Applications

资金

National Science Foundation [0845827]
DOE [DE-SC0004335]
Direct For Biological Sciences
Div Of Biological Infrastructure [845827] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback-Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes. TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N = 6 for all taxonomic levels. (C) 2011 Elsevier Ltd. All rights reserved.

Information-theoretic approaches to SVM feature selection for metagenome read classification

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Information-theoretic approaches to SVM feature selection for metagenome read classification

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY

出版社

ELSEVIER SCI LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文