4.7 Article

Small values in big data: The continuing need for appropriate metadata

Journal

ECOLOGICAL INFORMATICS
Volume 45, Issue -, Pages 26-30

Publisher

ELSEVIER
DOI: 10.1016/j.ecoinf.2018.03.002

Keywords

-

Categories

Funding

  1. U.S. National Science Foundation MacroSystems Biology Program in the Biological Sciences Directorate [EF-1065786, EF-1065649, EF-1065818]
  2. USDA National Institute of Food and Agriculture, Hatch Project [176820]

Ask authors/readers for more resources

Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many ecological datasets contain left-censored data observations below an analytical detection limit. Studies from single and typically small datasets show that common approaches for handling censored data - e.g., deletion or substituting fixed values - result in systematic biases. However, no studies have explored the degree to which the documentation and presence of censored data influence outcomes from large, multi-sourced datasets. We describe left-censored data in a lake water quality database assembled from 74 sources and illustrate the challenges of dealing with small values in big data, including detection limits that are absent, range widely, and show trends over time. We show that substitutions of censored data can also bias analyses using 'big data' datasets, that censored data can be effectively handled with modem quantitative approaches, but that such approaches rely on accurate metadata that describe treatment of censored data from each source.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available