4.7 Article

Text mining of accident reports using semi-supervised keyword extraction and topic modeling

Journal

PROCESS SAFETY AND ENVIRONMENTAL PROTECTION
Volume 155, Issue -, Pages 455-465

Publisher

ELSEVIER
DOI: 10.1016/j.psep.2021.09.022

Keywords

Accidents; Text mining; Document classification; Aviation Safety Reporting System (ASRS); Pipeline and Hazardous Materials Safety; Administration (PHSMA)

Ask authors/readers for more resources

This paper introduces an automated semi-supervised approach for analyzing accident reports, which identifies domain-specific keywords and groups them into topics to achieve data mining purposes. The method demonstrated an average classification accuracy of 80% in two different domain case studies and can generate domain-specific predictive models with limited manual intervention.
Learning from past incidents is critical to achieving and maintaining high process safety performance. Accident and incident records provide one way for learning; however, these are usually in the form of unstructured texts, which makes analysis difficult. Recently, text mining methods based on supervised learning have been proposed for analyzing accident reports; however, they require an impractically large number of labeled records as training examples. This paper proposes an automated, semi-supervised, do-main-independent approach for analyzing accident reports. Given a set of user-defined classification topics and domain literature such as handbooks, glossaries, and Wikipedia articles, the method can identify domain-specific keywords and group them into topics with minimal expert involvement. These keywords and topics can then be used for various data mining purposes, including classification. The proposed approach is demonstrated using two different case studies across domains: (1) in aviation to identify the stage of flight when an accident occurs, and (2) in the process industry domain to identify the cause of pipeline accidents. The average classification accuracy of the proposed method was 80% which is comparable to that of supervised learning methods. The key benefits of this approach are that it can generate domain-specific predictive models with limited manual intervention. (C) 2021 Institution of Chemical Engineers. Published by Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available