期刊
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY
卷 15, 期 7-8, 页码 483-494出版社
MARY ANN LIEBERT, INC
DOI: 10.1089/omi.2010.0066
关键词
-
资金
- Chinese Academy of Sciences [KSCX1-YW-22-01]
- Ministry of Science and Technology of China [2009CB825607, 2011CB910202]
- National Natural Science Foundation [30730033, 90919059]
- Shanghai Postdoctoral Scientific Program [09R21414900]
- China Postdoctoral Science Foundation [20090450573]
- European Community (TB-VIR network) [200973]
- BBSRC [BB/G022771/1] Funding Source: UKRI
- Biotechnology and Biological Sciences Research Council [BB/G022771/1] Funding Source: researchfish
Multidimensional genome-wide data (e.g., gene expression microarray data) provide rich information and widespread applications in integrative biology. However, little attention has been paid to the inherent relationships within these natural data. By simply viewing multidimensional microarray data scattered over hyperspace, the spatial properties (topological structure) of the data clouds may reveal the underlying relationships. Based on this idea, we herein make analytical improvements by introducing a topology-preserving selection and clustering (TPSC) approach to complex large-scale microarray data. Specifically, the integration of self-organizing map (SOM) and singular value decomposition allows genome-wide selection on sound foundations of statistical inference. Moreover, this approach is complemented with an SOM-based two-phase gene clustering procedure, allowing the topology-preserving identification of gene clusters. These gene clusters with highly similar expression patterns can facilitate many aspects of biological interpretations in terms of functional and regulatory relevance. As demonstrated by processing large and complex datasets of the human cell cycle, stress responses, and host cell responses to pathogen infection, our proposed method can yield better characteristic features from the whole datasets compared to conventional routines. We hence conclude that the topology-preserving selection and clustering without a priori assumption on data structure allow the in-depth mining of biological information in a more accurate and unbiased manner. A Web server (http://www.cs.bris.ac.uk/similar to hfang/TPSC) hosting a MATLAB package that implements the methodology is freely available to both academic and nonacademic users. These advances will expand the scope of omics applications.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据