4.7 Article

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

期刊

BIOINFORMATICS
卷 30, 期 1, 页码 119-120

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt601

关键词

-

资金

  1. Finnish Strategic Centre for Science, Technology and Innovation DIGILE
  2. Academy of Finland [139402]
  3. Sardinian (Italy) [L7-2010/COBIK]
  4. COST Action [BM1006]

向作者/读者索取更多资源

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据