☆ 4.4 Article

Answering biological questions by querying k-mer databases

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2013)

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Volume 25, Issue 4, Pages 497-509

Publisher

WILEY

DOI: 10.1002/cpe.2938

Keywords

k-mer; database; biological query; sequence data; bacterial genomes

Funding

CSIRO Transformational Biology Capability Platform

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper describes a k-mer approach to analysing DNA data and quickly answering certain types of ad hoc biological questions. These k-mers (short DNA strings) are stored in a conventional relational database and indexed to support efficient exact match operations. We show that k-mers around 20-25 bases long have interesting and useful uniqueness properties that can be used to compute a 'relatedness' metric and also allow k-mers to be used as 'unique enough' tags to identify organisms and genes. This relatedness metric is used in SQL queries that can directly answer questions such as how two related species differ, and what genes are unique to an organism. The k-mer tags have proven useful in applications, largely metagenomic ones that can quickly process large volumes of sequencing data to say something about what organisms and genes might be present in an environmental sample. All of this work is based on simple and fast exact matches of k-mer strings using a database, rather than conventional alignment based on inexact matches of much longer strings. These k-mer tools provide ways of rapidly exploring large genome spaces and handling large volumes of sequence data, and complement rather than replace existing alignment and assembly tools. Copyright (C) 2012 John Wiley & Sons, Ltd.

Answering biological questions by querying k-mer databases

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Answering biological questions by querying k-mer databases

Journal

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Publisher

WILEY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper