4.3 Article

Integrating background knowledge from internet databases into predictive toxicology models

Journal

SAR AND QSAR IN ENVIRONMENTAL RESEARCH
Volume 21, Issue 1-2, Pages 21-35

Publisher

TAYLOR & FRANCIS LTD
DOI: 10.1080/10629360903560579

Keywords

(Q)SAR; data integration; internet databases; cheminformatics; machine learning

Ask authors/readers for more resources

While data integration for data analysis has been investigated extensively in biological applications, it has not yet been so much the focus in computational chemistry and quantitative structure-activity relationship (QSAR) research. With the availability and growing number of chemical databases on the web, such data integration efforts become an intriguing possibility (and, in fact, a necessity). In this paper, we take a first step towards the following vision and scenario for predictive toxicology applications. Given a new structure to be predicted, the first step would be to gather (integrate) all relevant information from internet databases for the structure itself, and all structures with available information for the endpoint of interest. In a second step, the collected information is combined statistically into a prediction of the new structure. We simulate this scenario with three endpoints (data sets) from the DSSTox database and collect information from three public chemical databases: PubChem, ChemBank and Sigma-Aldrich. In the experiments, we investigate whether the addition of background knowledge from the three databases can improve predictive performance (over using chemical structure alone) in a statistically significant way. For this purpose, we define groups of features (belonging together from an application point of view) from the three databases, and perform a variant of forward selection to include these feature groups in a prediction model. Our experiments show that the integration of background knowledge from internet databases can significantly improve prediction performance, especially for regression tasks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available