Journal
PLOS ONE
Volume 6, Issue 12, Pages -Publisher
PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0028819
Keywords
-
Categories
Ask authors/readers for more resources
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with gentle masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0, S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to harsh masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available