☆ 4.5 Article

Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

JOURNAL OF COMPUTATIONAL BIOLOGY (2009)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 16, 期 11, 页码 1601-1613

出版社

MARY ANN LIEBERT INC

DOI: 10.1089/cmb.2008.0146

关键词

Burrows-Wheeler transform; genome mapping; rank and select functions; short-read DNA sequences

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We have developed efficient in-practice algorithms for computing rank and select functions on a binary string, based on a novel data structure, a hierarchical binary string with hierarchical accumulatives. It efficiently stores decomposed information on partial summations over various scales of subregions of a given binary string, so that the required space overhead ratio is only about 3.5% irrespective of the string length. Values of rank and select functions are computed hierarchically in inverted right perpendicular(log(2)n)/8inverted left perpendicular iterations, where n is the string length. For example, for an unbiased random binary string of 64G bits, each value of these functions can be computed in about a microsecond, on average, on a single 3.0-GHz CPU using 8+ GB of memory. We also present their applications to genome mapping problems for large-scale short-read DNA sequence data, especially produced by ultra-high-throughput new-generation DNA sequencers. The algorithms are applied to the binarization of the Burrows-Wheeler transform of the human genome DNA sequence. For the sake of highspeed performance, we adopted a somewhat stringent mapping condition that allows at most a single-base mismatch (either a substitution, insertion, or deletion of a single base) per query sequence. An experimentally implemented program mapped several thousands of sequences per second on a single 3.0-GHz CPU, several times faster than ELAND, a widely used mapping program with the Illumina-Solexa 1G analyser.

Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Computation of Rank and Select Functions on Hierarchical Binary String and Its Application to Genome Mapping Problems for Short-Read DNA Sequences

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

出版社

MARY ANN LIEBERT INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文