4.7 Article

Estimating PM2.5 from multisource data: A comparison of different machine learning models in the Pearl River Delta of China

Journal

URBAN CLIMATE
Volume 35, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.uclim.2020.100740

Keywords

PM2.5; Air pollution; Machine learning; Pearl River Delta; Point of Interest (POI)

Funding

  1. National Natural Science Foundation of China [41871029]
  2. National Key R&D Program of China [2019YFC1510400]
  3. Pearl River Talent Recruitment Program of Guangdong Province, China [2017GC010634]

Ask authors/readers for more resources

This study evaluated six machine learning models for estimating PM2.5 concentrations in the Pearl River Delta region of China from August 2014 to December 2019. Tree-structured models like Random Forest and Gradient Boosting Regression Tree generally produced better estimations, while neural network models like Back Propagation Neural Network and Elman Neural Network showed similar accuracy. Generalized Additive Model performed the worst, followed by Support Vector Machines. Random Forest is highly recommended for its estimation accuracy, with Gradient Boosting Regression Tree also being a promising model for daily PM2.5 estimation in the PRD region.
Air pollution with high concentrations of fine particulate matter (PM2.5) poses severe threats to human health. Accurate estimation of PM2.5 concentrations can timely assist relevant agencies to conduct air pollution treatment and provide essential data sources for epidemiological research related to PM2.5 exposure. Although China has established a network for monitoring ground-level PM2.5 concentrations over the past decades, the limited available records from the sparsely located PM2.5 monitoring sites hinder the fine-resolution research of air pollution. Many studies have been conducted to fill the data gap caused by sparsely distributed monitoring sites, but the accuracy of different models varies greatly. In recent years, machine learning models have become the preferred choices due to their high estimation accuracy. However, the estimation accuracy may differ significantly in different study areas with different models, and there are few studies on model performance evaluation regarding the Pearl River Delta (PRD) region of China. This study evaluated the performance of six machine learning models for estimating PM2.5 concentrations in PRD from August 2014 to December 2019. Moreover, multi-source data were adopted for reliable daily PM2.5 concentration estimation, including meteorology, vegetation, topography, and point of interest (POI). The results show that the tree-structured models (i.e., Random Forest (RF) and Gradient Boosting Regression Tree (GBRT)) generally produce better estimations than other models. Two neural network models (i.e., Back Propagation Neural Network (BPNN) and Elman Neural Network (ENN)) show a similar estimation accuracy. Additionally, the Generalized Additive Model (GAM) generally gives the worst performance, followed by the Support Vector Machines (SVM) model. RF is thus highly recommended based on the estimation accuracy, while GBRT is also a promising model for daily PM2.5 estimation in PRD. Our study provides a reference for selecting an appropriate model for daily PM2.5 concentration estimation in PRD and other regions with climate background.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available