4.7 Review

Streamlining data-intensive biology with workflow systems

Journal

GIGASCIENCE
Volume 10, Issue 1, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/giaa140

Keywords

workflows; automation; repeatability; data-intensive biology

Funding

  1. Moore Foundation [GBMF4551]
  2. State and Federal Water Contractors grant [A19-1844]
  3. NSF [1711984]
  4. Div Of Biological Infrastructure
  5. Direct For Biological Sciences [1711984] Funding Source: National Science Foundation

Ask authors/readers for more resources

With the increasing scale of biological data generation, the bottleneck of research has shifted from data generation to analysis. Data-centric workflow systems are reshaping the landscape of biological data analysis, empowering researchers to conduct reproducible analyses at scale, but knowledge of these techniques is still lacking.
As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available