4.1 Article

Mining SNPs from EST sequences using filters and ensemble classifiers

Journal

GENETICS AND MOLECULAR RESEARCH
Volume 9, Issue 2, Pages 820-834

Publisher

FUNPEC-EDITORA
DOI: 10.4238/vol9-2gmr765

Keywords

Single nucleotide polymorphisms; Expressed sequence tag; Filter; Ensemble classifier; SNPDigger

Funding

  1. Chinese Natural Science Foundation [60932008, 60871092]
  2. Natural Science Foundation of Heilongjiang Province in China [ZJG0705]

Ask authors/readers for more resources

Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.1
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available