4.4 Article

Answering biological questions by querying k-mer databases

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Volume 25, Issue 4, Pages 497-509

Publisher

WILEY
DOI: 10.1002/cpe.2938

Keywords

k-mer; database; biological query; sequence data; bacterial genomes

Funding

  1. CSIRO Transformational Biology Capability Platform

Ask authors/readers for more resources

This paper describes a k-mer approach to analysing DNA data and quickly answering certain types of ad hoc biological questions. These k-mers (short DNA strings) are stored in a conventional relational database and indexed to support efficient exact match operations. We show that k-mers around 20-25 bases long have interesting and useful uniqueness properties that can be used to compute a 'relatedness' metric and also allow k-mers to be used as 'unique enough' tags to identify organisms and genes. This relatedness metric is used in SQL queries that can directly answer questions such as how two related species differ, and what genes are unique to an organism. The k-mer tags have proven useful in applications, largely metagenomic ones that can quickly process large volumes of sequencing data to say something about what organisms and genes might be present in an environmental sample. All of this work is based on simple and fast exact matches of k-mer strings using a database, rather than conventional alignment based on inexact matches of much longer strings. These k-mer tools provide ways of rapidly exploring large genome spaces and handling large volumes of sequence data, and complement rather than replace existing alignment and assembly tools. Copyright (C) 2012 John Wiley & Sons, Ltd.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available