4.3 Article

Space-efficient construction of compressed suffix trees

Journal

THEORETICAL COMPUTER SCIENCE
Volume 852, Issue -, Pages 138-156

Publisher

ELSEVIER
DOI: 10.1016/j.tcs.2020.11.024

Keywords

Burrows-Wheeler transform; Compressed suffix tree; LCP; PLCP

Funding

  1. [RBSI146R5L]

Ask authors/readers for more resources

The paper demonstrates how to construct data structures essential for string processing using the Burrows-Wheeler transform (BWT) as input. The algorithms provided for enumerating LCP values, suffix tree intervals, PLCP bitvector, and balanced parentheses representation have efficient space and time complexities. Additionally, the paper proposes solutions for merging BWTs of string collections with improved efficiency.
We show how to build several data structures of central importance to string processing by taking as input the Burrows-Wheeler transform (BWT) and using small extra working space. Let n be the text length and sigma be the alphabet size. We first provide two algorithms that enumerate all LCP values and suffix tree intervals in O(n log sigma) time using just o(n log sigma) bits of working space on top of the input re-writable BWT. Using these algorithms as building blocks, for any parameter 0 < epsilon <= 1 we show how to build the PLCP bitvector and the balanced parentheses representation of the suffix tree topology in O (n(log sigma + epsilon(-1).log log n)) time using at most n logs . (epsilon + o(1)) bits of working space on top of the input re-writable BWT and the output. For example, we can build a compressed suffix tree from the BWT using just succinct working space (i.e. o(n log sigma) bits) and Theta (n log sigma + n(log log n)(1+delta)) time, for any constant delta > 0. This improves the previous most space-efficient algorithms, which worked in O(n) bits and O(n log n) time. We also consider the problem of merging BWTs of string collections, and provide a solution running in O(n log sigma) time and using just o(n log sigma) bits of working space. An efficient implementation of our LCP construction and BWT merge algorithms uses (in RAM) as few as n bits on top of a packed representation of the input/output and process data as fast as 2.92 megabases per second. (C) 2020 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available