4.7 Article

Identifying causality and contributory factors of pipeline incidents by employing natural language processing and text mining techniques

Journal

PROCESS SAFETY AND ENVIRONMENTAL PROTECTION
Volume 152, Issue -, Pages 37-46

Publisher

ELSEVIER
DOI: 10.1016/j.psep.2021.05.036

Keywords

Incident data; Pipeline; Contributing factor; Causality; Natural language processing

Funding

  1. Pipeline and Hazardous Materials Safety Administration (PHMSA) [693JK31850011CAAP]

Ask authors/readers for more resources

The key to learning from past incidents lies in identifying the underlying causes and contributory factors. Text data on incident narratives has been accumulated over the years, but the unstructured nature of the data impedes generating insights on occurring patterns of incidents. This research applies natural language processing and text mining techniques to understand contributing factors and causations behind incidents in the pipeline industry.
The key to learning from the past incidents is to identify the underlying causes and contributory factors of the incidents. A large amount of text data on incident narratives has been accumulated over the years and can be a good learning source, if properly utilized. However, the vast amount and unstructured nature of the text data impedes generating insights on occurring patterns of incidents. This research sets upon applying natural language processing (NLP) and text mining techniques to utilize the resource for understanding contributing factors and causations behind the incidents with pipeline industry as an illustrative example. The 3587 records of incident narratives of the 'comment' section in the incident database of Pipeline and Hazardous Materials Safety Administration (PHMSA) are exploited. Two methods of text analytics, K-means clustering and co-occurrence network, are employed to infer latent causality of incidents. The results demonstrate that both methods are capable of identifying contributing factors under specific failure types. The co-occurrence network approach exhibits advantages on extracting dependency among the contributory factors, while K-means clustering is only able to indicate general correlations. The workflow proposed in this paper provides new perspectives of identifying contributing factors and their causal dependency from incident text data for promising applications in risk analysis and accident modeling. (c) 2021 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available