☆ 4.7 Article

Application of machine learning and natural language processing for predicting stroke-associated pneumonia

FRONTIERS IN PUBLIC HEALTH (2022)

Journal

FRONTIERS IN PUBLIC HEALTH

Volume 10, Issue -, Pages -

Publisher

FRONTIERS MEDIA SA

DOI: 10.3389/fpubh.2022.1009164

Keywords

machine learning; natural language processing; pneumonia; prediction; risk score; stroke

Funding

Ditmanson Medical Foundation Chia-Yi Christian Hospital-National Chung Cheng University Joint Research Program
[CYCH-CCU-2022-14]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Machine learning and natural language processing techniques can be used to predict stroke-associated pneumonia more accurately. By combining structured data and unstructured clinical text, the machine learning model performed better than conventional risk scores.

BackgroundIdentifying patients at high risk of stroke-associated pneumonia (SAP) may permit targeting potential interventions to reduce its incidence. We aimed to explore the functionality of machine learning (ML) and natural language processing techniques on structured data and unstructured clinical text to predict SAP by comparing it to conventional risk scores. MethodsLinked data between a hospital stroke registry and a deidentified research-based database including electronic health records and administrative claims data was used. Natural language processing was applied to extract textual features from clinical notes. The random forest algorithm was used to build ML models. The predictive performance of ML models was compared with the A(2)DS(2), ISAN, PNA, and ACDD(4) scores using the area under the receiver operating characteristic curve (AUC). ResultsAmong 5,913 acute stroke patients hospitalized between Oct 2010 and Sep 2021, 450 (7.6%) developed SAP within the first 7 days after stroke onset. The ML model based on both textual features and structured variables had the highest AUC [0.840, 95% confidence interval (CI) 0.806-0.875], significantly higher than those of the ML model based on structured variables alone (0.828, 95% CI 0.793-0.863, P = 0.040), ACDD(4) (0.807, 95% CI 0.766-0.849, P = 0.041), A(2)DS(2) (0.803, 95% CI 0.762-0.845, P = 0.013), ISAN (0.795, 95% CI 0.752-0.837, P = 0.009), and PNA (0.778, 95% CI 0.735-0.822, P < 0.001). All models demonstrated adequate calibration except for the A(2)DS(2) score. ConclusionsThe ML model based on both textural features and structured variables performed better than conventional risk scores in predicting SAP. The workflow used to generate ML prediction models can be disseminated for local adaptation by individual healthcare organizations.

Application of machine learning and natural language processing for predicting stroke-associated pneumonia

Journal

FRONTIERS IN PUBLIC HEALTH

Publisher

FRONTIERS MEDIA SA

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Application of machine learning and natural language processing for predicting stroke-associated pneumonia

Journal

FRONTIERS IN PUBLIC HEALTH

Publisher

FRONTIERS MEDIA SA

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper