4.7 Article

Inside or Outside: Quantifying Extrapolation Across River Networks

Journal

WATER RESOURCES RESEARCH
Volume 54, Issue 9, Pages 6983-7003

Publisher

AMER GEOPHYSICAL UNION
DOI: 10.1029/2018WR023378

Keywords

random forests; multivariate adaptive regression splines; interpolation; extrapolation; river networks

Funding

  1. NIWA's Sustainable Water Allocation Programme
  2. NIWA overseas travel fund

Ask authors/readers for more resources

Regression techniques are often used to predict responses across landscapes or under scenarios describing changes in climate, management, or land cover. The ability of random forests (RFs) and multivariate adaptive regression splines (MARS) to predict flow variability, low flow, Escherichia coli, and a macroinvertebrate community index was compared. Cross validation was applied to test predictive performance across an induced spectrum of interpolation to extrapolation by splitting each data set into two geographical, environmental, and random groups. RF and MARS both represent nonlinear and interacting patterns but showed contrasting ability to interpolate and extrapolate. RF always performed better than MARS when interpolating within environmental space or extrapolating in geographical space. RF models for all four responses were transferable in geographic space but not to environmental conditions outside the training data. Neither technique was successful when extrapolating across environmental gradients, although RF out-performed MARS, despite RF predictions being constrained by the training data. New methods to quantify interpolation versus extrapolation for predictions are demonstrated. Degree of extrapolation is calculated by transforming both the training data and new predictors in response turnover space. A decline in cross-validation performance was related to an increase in degree of extrapolation regardless of whether extrapolating in geographical or environmental space. Degree of extrapolation is valuable. It identifies those predictions that are more reliable because they represent interpolation versus those that are more uncertain that represent extrapolation. For example, high degree of extrapolation under climate or land cover change indicates increased risk of producing misleading predictions from both RF and MARS. Plain Language Summary Regression models are used across the environmental sciences to find patterns between a response and its potential predictors. These patterns can be used to predict a response across broad areas or under new environmental conditions. This paper compares performance of two flexible regression techniques when predicting across a deliberately induced spectrum of interpolation to extrapolation. Various data sets were divided into two geographical, environmental and random groups. Models were trained on one half of the data and tested on the other. The two methods incorporate nonlinear and interacting relationships but suffer from unquantified uncertainty when extrapolating. Random forests always performed better than multivariate adaptive regression splines when interpolating within environmental space, and when extrapolating in geographical space. Random forests models were transferable in geographic space but not to environmental conditions outside the training data. Neither technique was successful when extrapolating across environmental gradients. The paper also describes and tests a new method to calculate degree of extrapolation: a value quantifying interpolation versus extrapolation for each prediction from either regression technique. The method can be used to indicate risk of spurious predictions when predicting at new locations (e.g., nationally) or under new environmental conditions (e.g., climatic change).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available