4.7 Article

Multiple alignment-free sequence comparison

期刊

BIOINFORMATICS
卷 29, 期 21, 页码 2690-2698

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt462

关键词

-

资金

  1. Oxford Martin School
  2. US NIH [R21HG006199]
  3. NSF [DMS-1043075]
  4. OCE [1136818]
  5. National Natural Science Foundation of China [31171262, 11021463]
  6. National Key Basic Research Project of China [2009CB918503]
  7. EPSRC [EP/K032402/1] Funding Source: UKRI
  8. Engineering and Physical Sciences Research Council [EP/K032402/1] Funding Source: researchfish
  9. Directorate For Geosciences
  10. Division Of Ocean Sciences [1136818] Funding Source: National Science Foundation

向作者/读者索取更多资源

Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, C-l* and C-l(S), extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, (C-2*) over bar, <(C-2(S))over bar> and <(C-2(geo))over bar>, averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据