4.7 Article

GCN-ST-MDIR: Graph Convolutional Network-Based Spatial-Temporal Missing Air Pollution Data Pattern Identification and Recovery

Journal

IEEE TRANSACTIONS ON BIG DATA
Volume 9, Issue 5, Pages 1347-1364

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TBDATA.2023.3277710

Keywords

Air pollution; Data models; Atmospheric modeling; Monitoring; Training; Convolutional neural networks; Big Data; Air pollution data; graph convolutional network; transfer learning; automatic; missing data pattern identification; missing data pattern recovery; similarity matrix; spatial-temporal

Ask authors/readers for more resources

GCN-ST-MDIR is a Graph Convolutional Network-based framework for Missing Data Pattern Identification and Recovery (MDIR), which identifies daily missing data patterns and selects the best recovery method automatically. It improves data representation for MDIR using a new graph construction and domain-specific knowledge. The model achieves better recovery performance compared to baselines, with an accuracy of 88.48% for general missing data recovery.
Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available