4.6 Article

A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications

Journal

JOURNAL OF CHEMINFORMATICS
Volume 10, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s13321-018-0315-6

Keywords

QSAR; Data curation; Data cleaning; Semi-automated; Workflow

Funding

  1. EU-ToxRisk [681002]
  2. LIFE-COMBASE [LIFE15 ENV/ES/000416]

Ask authors/readers for more resources

The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to check of very large databases manually. In the light of this, we designed and implemented a semi-automated workflow integrating structural data retrieval from several web-based databases, automated comparison of these data, chemical structure cleaning, selection and standardization of data into a consistent, ready-to-use format that can be employed for modeling. The workflow integrates best practices for data curation that have been suggested in the recent literature. The workflow has been implemented with the freely available KNIME software and is freely available to the cheminformatics community for improvement and application to a broad range of chemical datasets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Chemistry, Medicinal

Integrated In Silico Models for the Prediction of No-Observed(Adverse)-Effect Levels and Lowest-Observed-(Adverse)-Effect Levels in Rats for Sub-chronic Repeated-Dose Toxicity

Domenico Gadaleta, Marco Marzo, Andrey Toropov, Alla Toropova, Giovanna J. Lavado, Sylvia E. Escher, Jean Lou C. M. Dorne, Emilio Benfenati

Summary: This study attempted to model NO(A)EL and LO(A)EL simultaneously, integrating the models to improve performance. The strategy presented here proved effective in assessing RDT of chemicals using in silico models.

CHEMICAL RESEARCH IN TOXICOLOGY (2021)

Article Biochemistry & Molecular Biology

QSAR Models for Human Carcinogenicity: An Assessment Based on Oral and Inhalation Slope Factors

Cosimo Toma, Alberto Manganaro, Giuseppa Raitano, Marco Marzo, Domenico Gadaleta, Diego Baderna, Alessandra Roncaglioni, Nynke Kramer, Emilio Benfenati

Summary: This study developed classification and regression models for inhalation and oral slope factors, which showed good accuracy and R^2 values. These models may assist regulatory authorities in decision-making and weighing evidence in chemical safety assessments.

MOLECULES (2021)

Correction Environmental Sciences

CATMoS: Collaborative Acute Toxicity Modeling Suite (vol 129, 047013, 2021)

Kamel Mansouri, Agnes Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce V. Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yaroslav Chushak, Heather Ciallella, Alex M. Clark, Viviana Consonni, Pankaj R. Daga, Sean Ekins, Sherif Farag, Maxim Fedorov, Denis Fourches, Domenico Gadaleta, Feng Gao, Jeffery M. Gearhart, Garett Goh, Jonathan M. Goodman, Francesca Grisoni, Christopher M. Grulke, Thomas Hartung, Matthew Hirn, Pavel Karpov, Alexandru Korotcov, Giovanna J. Lavado, Michael Lawless, Xinhao Li, Thomas Luechtefeld, Filippo Lunghini, Giuseppe F. Mangiatordi, Gilles Marcou, Dan Marsh, Todd Martin, Andrea Mauri, Eugene N. Muratov, Glenn J. Myatt, Dac-Trung Nguyen, Orazio Nicolotti, Reine Note, Paritosh Pande, Amanda K. Parks, Tyler Peryea, Ahsan Polash, Robert Rallo, Alessandra Roncaglioni, Craig Rowlands, Patricia Ruiz, Daniel Russo, Ahmed Sayed, Risa Sayre, Timothy Sheils, Charles Siegel, Arthur C. Silva, Anton Simeonov, Sergey Sosnin, Noel Southall, Judy Strickland, Yun Tang, Brian Teppen, Igor V. Tetko, Dennis Thomas, Valery Tkachenko, Roberto Todeschini, Cosimo Toma, Ignacio Tripodi, Daniela Trisciuzzi, Alexander Tropsha, Alexandre Varnek, Kristijan Vukovic, Zhongyu Wang, Liguo Wang, Katrina M. Waters, Andrew J. Wedlake, Sanjeeva J. Wijeyesakere, Dan Wilson, Zijun Xiao, Hongbin Yang, Gergely Zahoranszky-Kohalmi, Alexey V. Zakharov, Fagen F. Zhang, Zhen Zhang, Tongan Zhao, Hao Zhu, Kimberley M. Zorn, Warren Casey, Nicole C. Kleinstreuer

ENVIRONMENTAL HEALTH PERSPECTIVES (2021)

Article Pharmacology & Pharmacy

Quantitative Structure-Activity Relationship Modeling of the Amplex Ultrared Assay to Predict Thyroperoxidase Inhibitory Activity

Domenico Gadaleta, Luca D'Alessandro, Marco Marzo, Emilio Benfenati, Alessandra Roncaglioni

Summary: The thyroid system plays a crucial role in physiological processes but can be disrupted by xenobiotics and contaminants, leading to various diseases. This study introduces QSAR models to predict the TPO inhibitory potential, developed using machine learning methods and validated rigorously internally and externally.

FRONTIERS IN PHARMACOLOGY (2021)

Article Environmental Sciences

Ecotoxicological QSAR modeling of the acute toxicity of organic compounds to the freshwater crustacean Thamnocephalus platyurus

Giovanna J. Lavado, Diego Baderna, Domenico Gadaleta, Marta Ultre, Kunal Roy, Emilio Benfenati

Summary: Research interest in environmental toxicity assessment using T. platyurus has increased, but there are currently no computational models to predict acute toxicity in this organism. This study developed QSAR models for predicting acute toxicity in T. platyurus, following OECD principles and using advanced machine learning techniques to achieve promising statistical quality in the dataset.

CHEMOSPHERE (2021)

Correction Environmental Sciences

CATMoS: Collaborative Acute Toxicity Modeling Suite (vol 129, 047013, 2021)

Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce V. Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yaroslav Chushak, Heather Ciallella, Alex M. Clark, Viviana Consonni, Pankaj R. Daga, Sean Ekins, Sherif Farag, Maxim Fedorov, Denis Fourches, Domenico Gadaleta, Feng Gao, Jeffery M. Gearhart, Garett Goh, Jonathan M. Goodman, Francesca Grisoni, Christopher M. Grulke, Thomas Hartung, Matthew Hirn, Pavel Karpov, Alexandru Korotcov, Giovanna J. Lavado, Michael Lawless, Xinhao Li, Thomas Luechtefeld, Filippo Lunghini, Giuseppe F. Mangiatordi, Gilles Marcou, Dan Marsh, Todd Martin, Andrea Mauri, Eugene N. Muratov, Glenn J. Myatt, Dac-Trung Nguyen, Orazio Nicolotti, Reine Note, Paritosh Pande, Amanda K. Parks, Tyler Peryea, Ahsan H. Polash, Robert Rallo, Alessandra Roncaglioni, Craig Rowlands, Patricia Ruiz, Daniel P. Russo, Ahmed Sayed, Risa Sayre, Timothy Sheils, Charles Siegel, Arthur C. Silva, Anton Simeonov, Sergey Sosnin, Noel Southall, Judy Strickland, Yun Tang, Brian Teppen, Igor V. Tetko, Dennis Thomas, Valery Tkachenko, Roberto Todeschini, Cosimo Toma, Ignacio Tripodi, Daniela Trisciuzzi, Alexander Tropsha, Alexandre Varnek, Kristijan Vukovic, Zhongyu Wang, Liguo Wang, Katrina M. Waters, Andrew J. Wedlake, Sanjeeva J. Wijeyesakere, Dan Wilson, Zijun Xiao, Hongbin Yang, Gergely Zahoranszky-Kohalmi, Alexey V. Zakharov, Fagen F. Zhang, Zhen Zhang, Tongan Zhao, Hao Zhu, Kimberley M. Zorn, Warren Casey, Nicole C. Kleinstreuer

ENVIRONMENTAL HEALTH PERSPECTIVES (2021)

Article Biochemistry & Molecular Biology

New Models to Predict the Acute and Chronic Toxicities of Representative Species of the Main Trophic Levels of Aquatic Environments

Cosimo Toma, Claudia I. Cappelli, Alberto Manganaro, Anna Lombardo, Juergen Arning, Emilio Benfenati

Summary: This study developed predictive models for acute and chronic toxicities in Raphidocelis subcapitata, Daphnia magna, and fish, with the random forest machine learning approach yielding the best results. The models showed good statistical quality for all endpoints, and are freely available for use as individual models in the VEGA platform and for prioritization in JANUS software.

MOLECULES (2021)

Article Biochemistry & Molecular Biology

Prediction of the Neurotoxic Potential of Chemicals Based on Modelling of Molecular Initiating Events Upstream of the Adverse Outcome Pathways of (Developmental) Neurotoxicity

Domenico Gadaleta, Nicoleta Spinu, Alessandra Roncaglioni, Mark T. D. Cronin, Emilio Benfenati

Summary: Developmental and adult/ageing neurotoxicity is an important area for chemical risk assessment. This study proposes a screening method using multiple QSAR models and AOP networks to predict neurotoxicity. The results show that the predictive performances of the integrated computational approach are comparable to traditional methods based on chemical descriptors and structural fingerprints, making it suitable for large-scale screening and prioritization of chemicals.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Article Toxicology

Integrate mechanistic evidence from new approach methodologies (NAMs) into a read-across assessment to characterise trends in shared mode of action

Sylvia E. Escher, Alejandro Aguayo-Orozco, Emilio Benfenati, Annette Bitsch, Thomas Braunbeck, Katharina Brotzmann, Frederic Bois, Bart van der Burg, Jose Castel, Thomas Exner, Domenico Gadaleta, Iain Gardner, Daria Goldmann, Oliver Hatley, Nazanin Golbamaki, Rabea Graepel, Paul Jennings, Alice Limonciel, Anthony Long, Richard Maclennan, Enrico Mombelli, Ulf Norinder, Sankalp Jain, Liliana Santos Capinha, Olivier T. Taboureau, Laia Tolosa, Nanette G. Vrijenhoek, Barbara M. A. Van Vugt-Lussenburg, Paul Walker, Bob van de Water, Matthias Wehr, Andrew White, Barbara Zdrazil, Ciaran Fisher

Summary: Read-across approaches may not provide sufficient evidence on a common mode of action across category members. A case study on branched aliphatic carboxylic acids shows the potential to induce hepatic steatosis. By analyzing gene expression patterns and adverse outcome pathways, researchers were able to confirm biological similarity and design an in vitro testing battery to systematically investigate a common mode of action among the compounds.

TOXICOLOGY IN VITRO (2022)

Article Biochemistry & Molecular Biology

Monte Carlo Models for Sub-Chronic Repeated-Dose Toxicity: Systemic and Organ-Specific Toxicity

Gianluca Selvestrel, Giovanna J. Lavado, Alla P. Toropova, Andrey A. Toropov, Domenico Gadaleta, Marco Marzo, Diego Baderna, Emilio Benfenati

Summary: The risk characterization of chemicals depends on the determination of repeated-dose toxicity (RDT), which involves the identification of the no-observed-adverse-effect level (NOAEL) and the lowest-observed-adverse-effect level (LOAEL). In vivo tests for RDT are time-consuming and expensive, making in silico models an attractive and challenging alternative. This study developed and validated eight in silico models for predicting NOAEL and LOAEL, focusing on systemic and organ-specific toxicity.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Article Pharmacology & Pharmacy

Ligand-based prediction of hERG-mediated cardiotoxicity based on the integration of different machine learning techniques

Pietro Delre, Giovanna J. Lavado, Giuseppe Lamanna, Michele Saviano, Alessandra Roncaglioni, Emilio Benfenati, Giuseppe Felice Mangiatordi, Domenico Gadaleta

Summary: This study developed highly predictive models of hERG-mediated cardiotoxicity using machine learning algorithms. The models were trained and validated using curated compounds from a freely accessible database. The study also proposed a new computational workflow for building such models. The results showed that these models outperformed commonly used models in the literature.

FRONTIERS IN PHARMACOLOGY (2022)

Article Biochemistry & Molecular Biology

The VEGA Tool to Check the Applicability Domain Gives Greater Confidence in the Prediction of In Silico Models

Alberto Danieli, Erika Colombo, Giuseppa Raitano, Anna Lombardo, Alessandra Roncaglioni, Alberto Manganaro, Alessio Sommovigo, Edoardo Carnesecchi, Jean-Lou C. M. Dorne, Emilio Benfenati

Summary: A thorough assessment of in silico models and their applicability domain is crucial for utilizing new approach methodologies (NAMs) in chemical risk assessment and building users' confidence. The VEGA tool is examined in this study to evaluate the applicability domain of in silico models, demonstrating its efficiency in identifying less accurate predictions for various toxicological endpoints. The tool evaluates chemical structures and related features, providing valuable insights for both regression models and classifiers.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2023)

Article Biochemistry & Molecular Biology

A KNIME Workflow to Assist the Analogue Identification for Read-Across, Applied to Aromatase Activity

Ana Yisel Caballero Alfonso, Chayawan Chayawan, Domenico Gadaleta, Alessandra Roncaglioni, Emilio Benfenati

Summary: The reduction and replacement of in vivo tests have become crucial in terms of resources and animal benefits. The read-across approach reduces the number of substances to be tested, exploiting existing experimental data to predict the properties of untested substances. In this paper, a workflow is introduced to support analogue identification for read-across. The workflow combines multiple similarity metrics to improve the predictions of toxicity.

MOLECULES (2023)

Article Environmental Sciences

CATMoS: Collaborative Acute Toxicity Modeling Suite

Kamel Mansouri, Agnes L. Karmaus, Jeremy Fitzpatrick, Grace Patlewicz, Prachi Pradeep, Domenico Alberga, Nathalie Alepee, Timothy E. H. Allen, Dave Allen, Vinicius M. Alves, Carolina H. Andrade, Tyler R. Auernhammer, Davide Ballabio, Shannon Bell, Emilio Benfenati, Sudin Bhattacharya, Joyce Bastos, Stephen Boyd, J. B. Brown, Stephen J. Capuzzi, Yaroslav Chushak, Heather Ciallella, Alex M. Clark, Viviana Consonni, Pankaj R. Daga, Sean Ekins, Sherif Farag, Maxim Fedorov, Denis Fourches, Domenico Gadaleta, Feng Gao, Jeffery M. Gearhart, Garett Goh, Jonathan M. Goodman, Francesca Grisoni, Christopher M. Grulke, Thomas Hartung, Matthew Hirn, Pavel Karpov, Alexandru Korotcov, Giovanna J. Lavado, Michael Lawless, Xinhao Li, Thomas Luechtefeld, Filippo Lunghini, Giuseppe F. Mangiatordi, Gilles Marcou, Dan Marsh, Todd Martin, Andrea Mauri, Eugene N. Muratov, Glenn J. Myatt, Dac-Trung Nguyen, Orazio Nicolotti, Reine Note, Paritosh Pande, Amanda K. Parks, Tyler Peryea, Ahsan H. Polash, Robert Rallo, Alessandra Roncaglioni, Craig Rowlands, Patricia Ruiz, Daniel P. Russo, Ahmed Sayed, Risa Sayre, Timothy Sheils, Charles Siegel, Arthur C. Silva, Anton Simeonov, Sergey Sosnin, Noel Southall, Judy Strickland, Yun Tang, Brian Teppen, Igor Tetko, Dennis Thomas, Valery Tkachenko, Roberto Todeschini, Cosimo Toma, Ignacio Tripodi, Daniela Trisciuzzi, Alexander Tropsha, Alexandre Varnek, Kristijan Vukovic, Zhongyu Wang, Liguo Wang, Katrina M. Waters, Andrew J. Wedlake, Sanjeeva J. Wijeyesakere, Dan Wilson, Zijun Xiao, Hongbin Yang, Gergely Zahoranszky-Kohalmi, Alexey Zakharov, Fagen F. Zhang, Zhen Zhang, Tongan Zhao, Hao Zhu, Kimberley M. Zorn, Warren Casey, Nicole C. Kleinstreuer

Summary: The international collaboration in developing in silico models for predicting acute oral toxicity, resulting in the CATMoS, has demonstrated high performance in terms of accuracy and robustness. This modeling suite is being evaluated by regulatory agencies as a potential replacement for in vivo rat acute oral toxicity studies.

ENVIRONMENTAL HEALTH PERSPECTIVES (2021)

Article Chemistry, Medicinal

Prediction of the Partition Coefficient between Adipose Tissue and Blood for Environmental Chemicals: from Single QSAR Models to an Integrated Approach

Claudia Ileana Cappelli, Serena Manganelli, Cosimo Toma, Emilio Benfenati, Enrico Mombelli

Summary: The study developed QSAR models for adipose tissue:blood partition coefficient using rat in vivo data, and showed that an integrated model combining multiple single models outperformed the individual ones, with an external mean absolute error of 0.26 and 84% coverage, comparable to experimental variability.

MOLECULAR INFORMATICS (2021)

No Data Available