☆ 4.7 Article

Natural language indexing for pedoinformatics

GEODERMA (2019)

Journal

GEODERMA

Volume 334, Issue -, Pages 49-54

Publisher

ELSEVIER SCIENCE BV

DOI: 10.1016/j.geoderma.2018.07.050

Keywords

Soil science; Classification; Taxonomy; Databases; Text mining

Funding

Environmental Quality and Installations program at the U.S. Army Engineer Research and Development Center, Vicksburg, MS

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The multiple schema for the classification of soils rely on differing criteria but the major soil science systems, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources soil classification systems, are primarily based on inferred pedogenesis. Largely these classifications are compiled from individual observations of soil characteristics within soil profiles, and the vast majority of this pedologic information is contained in non-quantitative text descriptions. We present initial text mining analyses of parsed text in the digitally available USDA soil taxonomy documentation and the Soil Survey Geographic database. Previous research has shown that latent information structure can be extracted from scientific literature using Natural Language Processing techniques, and we show that this latent information can be used to expedite query performance by using syntactic elements and part-of-speech tags as indices. Technical vocabulary often poses a text mining challenge due to the rarity of its diction in the broader context. We introduce an extension to the common English vocabulary that allows for nearly-complete indexing of USDA Soil Series Descriptions.

Natural language indexing for pedoinformatics

Journal

GEODERMA

Publisher

ELSEVIER SCIENCE BV

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Natural language indexing for pedoinformatics

Journal

GEODERMA

Publisher

ELSEVIER SCIENCE BV

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper