4.4 Article

Large-scale digitization of herbarium specimens: Development and usage of an automated, high-throughput conveyor system

期刊

TAXON
卷 67, 期 1, 页码 165-178

出版社

WILEY
DOI: 10.12705/671.9

关键词

automation; biodiversity informatics; digitization; herbarium specimens; imaging; New England; transcription; workflows

资金

  1. ADBC program of the U.S. National Science Foundation [1208835, 1209149]
  2. Direct For Biological Sciences
  3. Div Of Biological Infrastructure [1209149] Funding Source: National Science Foundation
  4. Direct For Biological Sciences
  5. Div Of Biological Infrastructure [1208835] Funding Source: National Science Foundation

向作者/读者索取更多资源

The billions of specimens housed in natural science collections provide a tremendous source of under-utilized data that are useful for scientific research, conservation, commerce, and education. Digitization and mobilization of specimen data and images promises to greatly accelerate their utilization. While digitization of natural science collection specimens has been occurring for decades, the vast majority of specimens remain un-digitized. If the digitization task is to be completed in the near future, innovative, high-throughput approaches are needed. To create a dataset for the study of global change in New England, we designed and implemented an industrial-scale, conveyor-based digitization workflow for herbarium specimen sheets. The workflow is a variation of an object-to-image-to-data workflow that prioritizes imaging and the capture of storage container-level data. The workflow utilizes a novel conveyor system developed specifically for the task of imaging flattened herbarium specimens. Using our workflow, we imaged and transcribed specimen-level data for almost 350,000 specimens over a 131-week period; an additional 56 weeks was required for storage container-level data capture. Our project has demonstrated that it is possible to capture both an image of a specimen and a core database record in 35 seconds per herbarium sheet (for intervals between images of 30 minutes or less) plus some additional overhead for container-level data capture. This rate was in line with the pre-project expectations for our approach. Our throughput rates are comparable with some other similar, high-throughput approaches focused on digitizing herbarium sheets and is as much as three times faster than rates achieved with more conventional non-automated approaches used during the project. We report on challenges encountered during development and use of our system and discuss ways in which our workflow could be improved. The conveyor apparatus software, database schema, configuration files, hardware list, and conveyor schematics are available for download on GitHub.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据