☆ 4.7 Article

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

BIOINFORMATICS (2014)

期刊

BIOINFORMATICS

卷 30, 期 1, 页码 119-120

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btt601

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

Finnish Strategic Centre for Science, Technology and Innovation DIGILE
Academy of Finland [139402]
Sardinian (Italy) [L7-2010/COBIK]
COST Action [BM1006]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文