4.8 Article

Analyzing High Dimensional Toxicogenomic Data Using Consensus Clustering

Journal

ENVIRONMENTAL SCIENCE & TECHNOLOGY
Volume 46, Issue 15, Pages 8413-8421

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/es3000454

Keywords

-

Funding

  1. CDM/Diane and William Howard scholarship
  2. Industrial Translational Research Initiative (ITRI) Scholarship
  3. National Science Foundation (NSF) [EEC-0926284, CAREER CBET- 0953633]
  4. National Science Foundation Nanoscale Science and Engineering Center (NSEC) for Highrate Nano-manufacturing [0425826]
  5. Div Of Chem, Bioeng, Env, & Transp Sys
  6. Directorate For Engineering [0926284] Funding Source: National Science Foundation

Ask authors/readers for more resources

Rapid development of high-throughput toxicogenomics technologies has created new approaches to screen environmental samples for mechanistic toxicity assessment. However, challenges remain in the analysis, especially clustering of the resulting high-dimensional data. Because of the lack of commonly accepted validation methods, it is difficult to compare clustering results between studies or to identify the key experimental or data features that impact the clustering results. We applied consensus clustering (CC), an approach that dusters the input data repeatedly through iterative resampling, and identifies frequently occurring high-confidence dusters. We used CC to analyze a set of high dimensional transcriptomics data with temporal resolution, which were generated using our E. coli whole-cell array system for a diverse variety of toxicants at different dose concentrations. The CC analysis allowed us to evaluate the clustering results robustness and sensitivity against a number of conditions that represent the common variations in high-throughput experiments, including noisy data, subsets of treatments, subsets of reporter genes, and subsets of time points. We demonstrated the value of utilizing rich time-series data and underscored the importance of careful selection of sampling times for a given experimental system. The results also indicated that temporal data compression using our proposed Transcriptional Effect Level Index (TELI) concept followed by CC largely conserved the duster resolution. We also found that for our cellular stress response ensemble-based high-throughput transcriptomics assay platform, the size and composition of the reporter gene set are critical factors that affect the resulting coherency of dusters Taken together, these results demonstrated that more robust consensus clustering such as CC may be valuable in analyzing high-dimensional tolicogenomic data sets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available