4.3 Article

Classification of biodegradable materials using QSAR modelling with uncertainty estimation

Journal

SAR AND QSAR IN ENVIRONMENTAL RESEARCH
Volume 27, Issue 10, Pages 799-811

Publisher

TAYLOR & FRANCIS LTD
DOI: 10.1080/1062936X.2016.1238010

Keywords

Partial least squares discriminant analysis; uncertainty estimation; bootstrap; machine learning; biodegradable materials; QSAR

Funding

  1. Intramural NIST DOC [9999-NIST] Funding Source: Medline

Ask authors/readers for more resources

The ability to determine the biodegradability of chemicals without resorting to expensive tests is ecologically and economically desirable. Models based on quantitative structure-activity relations (QSAR) provide some promise in this direction. However, QSAR models in the literature rarely provide uncertainty estimates in more detail than aggregated statistics such as the sensitivity and specificity of the model's predictions. Almost never is there a means of assessing the uncertainty in an individual prediction. Without an uncertainty estimate, it is impossible to assess the trustworthiness of any particular prediction, which leaves the model with a low utility for regulatory purposes. In the present work, a QSAR model with uncertainty estimates is used to predict biodegradability for a set of substances from a publicly available data set. Separation was performed using a partial least squares discriminant analysis model, and the uncertainty was estimated using bootstrapping. The uncertainty prediction allows for confidence intervals to be assigned to any of the model's predictions, allowing for a more complete assessment of the model than would be possible through a traditional statistical analysis. The results presented here are broadly applicable to other areas of modelling as well, because the calculation of the uncertainty will clearly demonstrate where additional tests are needed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

Classification of samples from NMR-based metabolomics using principal components analysis and partial least squares with uncertainty estimation

Werickson Fortunato de Carvalho Rocha, David A. Sheen, Daniel W. Bearden

ANALYTICAL AND BIOANALYTICAL CHEMISTRY (2018)

Article Chemistry, Physical

Evaluated Site-Specific Rate Constants for Reaction of Isobutane with H and CH3: Shock Tube Experiments Combined with Bayesian Model Optimization

Laura A. Mertens, Iftikhar A. Awan, David A. Sheen, Jeffrey A. Manion

JOURNAL OF PHYSICAL CHEMISTRY A (2018)

Article Instruments & Instrumentation

interlab: A Python Module for Analyzing Interlaboratory Comparison Data

David A. Sheen

JOURNAL OF RESEARCH OF THE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY (2019)

Article Multidisciplinary Sciences

Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization

Melis Onel, Burcu Beykal, Kyle Ferguson, Weihsueh A. Chiu, Thomas J. McDonald, Lan Zhou, John S. House, Fred A. Wright, David A. Sheen, Ivan Rusyn, Efstratios N. Pistikopoulos

PLOS ONE (2019)

Article Biochemistry & Molecular Biology

Metabolomics Test Materials for Quality Control: A Study of a Urine Materials Suite

Daniel W. Bearden, David A. Sheen, Yamil Simon-Manso, Bruce A. Benner, Werickson F. C. Rocha, Niksa Blonder, Katrice A. Lippa, Richard D. Beger, Laura K. Schnackenberg, Jinchun Sun, Khyati Y. Mehta, Amrita K. Cheema, Haiwei Gu, Ramesh Marupaka, G. A. Nagana Gowda, Daniel Raftery

METABOLITES (2019)

Article Automation & Control Systems

Chemometric outlier classification of 2D-NMR spectra to enable higher order structure characterization of protein therapeutics

David A. Sheen, Vincent K. Shen, Robert G. Brinson, Luke W. Arbogast, John P. Marino, Frank Delaglio

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2020)

Article Chemistry, Physical

Flat-histogram extrapolation as a useful tool in the age of big data

Nathan A. Mahynski, Harold W. Hatch, Matthew Witman, David A. Sheen, Jeffrey R. Errington, Vincent K. Shen

Summary: By combining statistical mechanical principles with biased sampling techniques, it is possible to predict the thermodynamic properties of systems more accurately and achieve precise estimates across a wide range of conditions. These extrapolations significantly increase the amount of accurate information that can be extracted from simulations, providing data for data-intensive algorithms.

MOLECULAR SIMULATION (2021)

Article Biochemistry & Molecular Biology

Principal component analysis for automated classification of 2D spectra and interferograms of protein therapeutics: influence of noise, reconstruction details, and data preparation

Robert G. Brinson, K. Wade Elliott, Luke W. Arbogast, David A. Sheen, John P. Giddens, John P. Marino, Frank Delaglio

JOURNAL OF BIOMOLECULAR NMR (2020)

Article Energy & Fuels

Laser-driven calorimetry and chemometric quantification of standard reference material diesel/biodiesel fuel blends

Werickson Fortunato de Carvalho Rocha, Cary Presser, Shannon Bernier, Ashot Nazarian, David A. Sheen

Article Nanoscience & Nanotechnology

Predicting the Mixing Behavior of Aqueous Solutions Using a Machine Learning Framework

Chris J. Peacock, Connor Lamont, David A. Sheen, Vincent K. Shen, Laurent Kreplak, John P. Frampton

Summary: By studying the pairwise mixing behavior of 68 water-soluble compounds and using machine learning classifiers to predict their miscibility, the random forest classifier emerged as the most successful with high levels of accuracy, specificity, and sensitivity under different scenarios. The potential of this machine learning approach to improve the design of screening experiments for aqueous two-phase systems for various scientific and industrial applications was demonstrated.

ACS APPLIED MATERIALS & INTERFACES (2021)

Article Chemistry, Analytical

Assessing arsenic species in foods using regularized linear regression of the arsenic K-edge X-ray absorption near edge structure

Evan P. Jahrman, Lee L. Yu, William P. Krekelberg, David A. Sheen, Thomas C. Allison, John L. Molloy

Summary: The speciation of arsenic plays a crucial role in its toxicity and bioavailability. This study explores the use of X-ray spectroscopies to determine arsenic speciation profiles in materials related to public health initiatives, such as food safety. The results provide insights into the efficacy of X-ray spectroscopy and the accuracy of analysis. The study also introduces the lasso regression method to improve the statistical inferences and reduce overfitting.

JOURNAL OF ANALYTICAL ATOMIC SPECTROMETRY (2022)

Meeting Abstract Chemistry, Multidisciplinary

Chemometric analysis of hydrocarbon reference materials for certification as aircraft fuels

David Sheen, Werickson F. C. Rocha

ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY (2018)

Meeting Abstract Chemistry, Multidisciplinary

Data harmonization in metabolomics for quality assurance and control

David Sheen, Bruce Benner, Yamil Simon, Werickson F. C. Rocha, Christina Jones, Niksa Blonder, Katrice Lippa

ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY (2018)

Article Automation & Control Systems

A scoring metric for multivariate data for reproducibility analysis using chemometric methods

David A. Sheen, Werickson F. C. Rocha, Katrice A. Lippa, Daniel W. Bearden

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2017)

No Data Available