4.7 Article

User-friendly bioinformatics pipeline gDAT (graphical downstream analysis tool) for analysing rDNA sequences

期刊

MOLECULAR ECOLOGY RESOURCES
卷 21, 期 4, 页码 1380-1392

出版社

WILEY
DOI: 10.1111/1755-0998.13340

关键词

arbuscular mycorrhizal fungi; high‐ throughput sequencing; pipeline; sequencing data analysis; software; teaching tool

资金

  1. Estonian Research Council grant [PRG1065]
  2. University of Tartu [PLTOM20903]
  3. European Regional Development Fund
  4. ERA-NET Cofund BiodivERsA3

向作者/读者索取更多资源

The paper introduces a pipeline, gDAT, which incorporates common command line tools into an easy-to-use graphical interface using Python scripting language. It is compatible with different sequencing platforms, allowing for rapid data analysis with features such as quality filtering and operational taxonomic unit picking.
High-throughput sequencing (HTS) of multiple organisms in parallel (metabarcoding) has become a routine and cost-effective method for the analysis of microbial communities in environmental samples. However, careful data treatment is required to identify potential errors in HTS data, and the large volume of data generated by HTS requires in-house experience with command line tools for downstream analysis. This paper introduces a pipeline that incorporates the most common command line tools into an easy-to-use graphical interface-gDAT. By using the Python scripting language, the pipeline is compatible with the latest Windows, macOS and Linux operating systems. The pipeline supports analysis of Sanger, 454, IonTorrent, Illumina and PacBio sequences, allows custom modification of quality filtering steps, and implements both open and closed-reference operational taxonomic unit-picking for sequence identification. Predefined parameters are optimized for analysis of small subunit (SSU) rRNA gene amplicons from arbuscular mycorrhizal fungi, but the pipeline is widely applicable to metabarcoding studies targeting a broad range of organisms. The pipeline was additionally tested with data using general eukaryotic primers from the SSU gene region and fungal primers from the internal transcribed spacer (ITS) marker region. We describe the pipeline design and evaluate its performance and speed by conducting analysis of example data sets using different marker regions sequenced on Illumina platforms. The graphical interface, with the option to use the command line if needed, provides an accessible tool for rapid data analysis with repeatability and logging capabilities. Keeping the software open-source maximizes code accessibility, allowing scrutiny and bug fixes by the community.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据