4.7 Article

Developing machine learning models with multi-source environmental data to predict wheat yield in China

Journal

COMPUTERS AND ELECTRONICS IN AGRICULTURE
Volume 194, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.compag.2022.106790

Keywords

Yield prediction; Vegetation indices; NIRv; Random forest; Support vector machine; Wheat

Funding

  1. Natural Science Foundation of China [41961124006, 41730645, 52079114]
  2. Natural Sci-ence Foundation of Qinghai [2021-HZ-811]
  3. National Key Research and Development Program of China [2019YFE0125300, 2017YFE0122500]

Ask authors/readers for more resources

This study integrated multi-source environmental variables into random forest and support vector machine models for wheat yield prediction in China. The results showed that using remotely sensed vegetation indices improved the precision of the models, with near-infrared reflectance being slightly better than other indices. The relative importance and partial dependence analyses identified the main predictors and their relationships with wheat yield.
Crop yield is controlled by different environmental factors. Multi-source data for site-specific soils, climates, and remotely sensed vegetation indices are essential for yield prediction. Algorithms of data-model fusion for crop growth monitoring and yield prediction are complicated and need to be optimized to deal with model uncertainty. This study integrated multi-source environmental variables (e.g., satellite-based vegetation indices, climate data, and soil properties) into random forest (RF) and support vector machine (SVM) models for wheat yield prediction in China. The performance of both RF and SVM models was investigated using different types of vegetation indices associated with other predictors. Relative importance and partial dependence analyses were used to identify the main predictors and their relationships with wheat yield. We found that using remotely sensed vegetation indices improved our model precision, and that near-infrared reflectance of terrestrial vegetation (NIRv) was slightly better than normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) in predicting yield. NIRv was better in detecting climate stress on crops, and could capture more information regarding crop growth and yield formation. Compared with the SVM model, the RF model with NIRv and other covariates had better performance in wheat yield prediction, with R-2 and RMSE being 0.74 and 758 kg/ha respectively. We also found that NIRv from jointing to heading was the most important predictor in determining yield, followed by solar radiation (especially during tillering-heading), relative humidity (during planting-tillering), soil organic carbon, and wind speed (throughout the growing season). In addition, wheat yield exhibited threshold-like responses to most factors based on our RF model. These threshold values can help to better understand how different environmental factors limit wheat yield, which will provide useful information for climate-adaptive crop management. Our findings demonstrated the potential of using NIRv for yield prediction. This approach is broadly applicable to other regions globally using publicly available data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available