4.7 Article

Quantitative structure?property relationships for the calculation of the soil adsorption coefficient using machine learning algorithms with calculated chemical properties from open-source software

Journal

ENVIRONMENTAL RESEARCH
Volume 196, Issue -, Pages -

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.envres.2020.110363

Keywords

QSPR; Environmental risk assessment; Gradient boosting decision trees; Koc; Mordred

Funding

  1. JSPS KAKENHI [19H04165]
  2. Grants-in-Aid for Scientific Research [19H04165] Funding Source: KAKEN

Ask authors/readers for more resources

This study developed a QSPR model to efficiently estimate Koc in the early stages of chemical development, using tools such as OPERA and Mordred. The combination of LightGBM, OPERA, and Mordred enabled highly accurate prediction of Koc for a wide range of chemicals. The use of fast-processing software like LightGBM allowed for parameter tuning and improved performance in predicting Koc values.
The soil adsorption coefficient (Koc) is an environmental fate parameter that is essential for environmental risk assessment. However, obtaining Koc requires a significant amount of time and enormous expenditure. Thus, it is necessary to efficiently estimate Koc in the early stages of a chemical?s development. In this study, a quantitative structure-property relationship (QSPR) model was developed using calculated physicochemical properties and molecular descriptors with the OPEn structure-activity/property Relationship App (OPERA) and Mordred software using the largest available Koc dataset. Specifically, we compared the accuracies of the model using the light gradient boosted machine (LightGBM), a gradient boosting decision tree (GBDT) algorithm, with those of previous models. The experimental results suggested the potential to develop a QSPR model that will produce highly accurate Koc values using molecular descriptors and physicochemical properties. Unlike previous studies, the use of a combination of LightGBM, OPERA and Mordred enables the prediction of Koc for many chemicals with high accuracy. In this study, OPERA was used to calculate the physicochemical properties, and Mordred was used to calculate molecular descriptors. The wide range of chemicals covered by OPERA and Mordred enables the analysis of a diverse range of chemical compounds. We also report a method to tune the LightBGM program. The use of fast-processing software, such as LightGBM, enables parameter tuning of a method required to obtain best performance. Our research represents one of the few studies in the field of environmental chemistry to use LightGBM. Using physicochemical properties as well as molecular descriptors, we could develop highly accurate Koc prediction models when compared to prior studies. In addition, our QSPR models may be useful for preliminary environmental risk assessment without incurring significant costs during the early chemical developmental stage.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available