4.5 Article Proceedings Paper

Heterogeneous network embedding enabling accurate disease association predictions

期刊

BMC MEDICAL GENOMICS
卷 12, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12920-019-0623-3

关键词

Network embedding; Heterogeneous network; Disease association prediction

资金

  1. National Science Foundation [DBI-1565137, DGE-1829071]
  2. National Natural Science Foundation of China [U1636207, 91546105]
  3. National Institutes of Health [R01 GM115833, U54 GM114833]
  4. Shanghai Science and Technology Development Fund [16JC1400801, 19511121204]

向作者/读者索取更多资源

Background It is significant to identificate complex biological mechanisms of various diseases in biomedical research. Recently, the growing generation of tremendous amount of data in genomics, epigenomics, metagenomics, proteomics, metabolomics, nutriomics, etc., has resulted in the rise of systematic biological means of exploring complex diseases. However, the disparity between the production of the multiple data and our capability of analyzing data has been broaden gradually. Furthermore, we observe that networks can represent many of the above-mentioned data, and founded on the vector representations learned by network embedding methods, entities which are in close proximity but at present do not actually possess direct links are very likely to be related, therefore they are promising candidate subjects for biological investigation. Results We incorporate six public biological databases to construct a heterogeneous biological network containing three categories of entities (i.e., genes, diseases, miRNAs) and multiple types of edges (i.e., the known relationships). To tackle the inherent heterogeneity, we develop a heterogeneous network embedding model for mapping the network into a low dimensional vector space in which the relationships between entities are preserved well. And in order to assess the effectiveness of our method, we conduct gene-disease as well as miRNA-disease associations predictions, results of which show the superiority of our novel method over several state-of-the-arts. Furthermore, many associations predicted by our method are verified in the latest real-world dataset. Conclusions We propose a novel heterogeneous network embedding method which can adequately take advantage of the abundant contextual information and structures of heterogeneous network. Moreover, we illustrate the performance of the proposed method on directing studies in biology, which can assist in identifying new hypotheses in biological investigation.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Information Systems

Estimating Time to Progression of Chronic Obstructive Pulmonary Disease With Tolerance

Chunlei Tang, Joseph M. Plasek, Xiao Shi, Meihan Wan, Haohan Zhang, Min-Jeoung Kang, Liqin Wang, Sevan M. Dulgarian, Yun Xiong, Jing Ma, David W. Bates, Li Zhou

Summary: This study predicts mortality risk in patients with chronic obstructive pulmonary disease using clinical notes, optimizing the accuracy of linear regression and support vector machines by determining a tolerance range. The results demonstrate an overall improvement in machine learning approaches after considering the optimal tolerance range.

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS (2021)

Article Biochemical Research Methods

A Second Look at FAIR in Proteomic Investigations

J. Harry Caufield, John Fu, Ding Wang, Vladimir Guevara-Gonzalez, Wei Wang, Peipei Ping

Summary: Proteomics aims to study protein features in entire systems, with various resources available to make results more discoverable, accessible, interoperable, and reusable. Linking specific terms, identifiers, and texts can unify individual data points, potentially revealing new relationships and maximizing the value of datasets and methods for the proteomics community and beyond.

JOURNAL OF PROTEOME RESEARCH (2021)

Article Computer Science, Theory & Methods

SEIZE: Runtime Inspection for Parallel Dataflow Systems

Youfu Li, Matteo Interlandi, Fotis Psallidas, Wei Wang, Carlo Zaniolo

Summary: Many DISC systems provide easy-to-use APIs and efficient scheduling and execution strategies for building concise data-parallel programs. However, some crucial features and optimizations are not well-supported, requiring runtime dataflow states to achieve.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2021)

Article Biochemical Research Methods

JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites

Jyun-Yu Jiang, Chelsea J-T Ju, Junheng Hao, Muhao Chen, Wei Wang

Summary: circRNA is a novel class of long non-coding RNAs that play important roles in gene regulation and disease association. The JEDI framework, utilizing deep learning and a cross-attention layer, effectively predicts circRNAs, outperforming existing methods significantly.

BIOINFORMATICS (2021)

Editorial Material Health Care Sciences & Services

The intersection of big data and epidemiology for epidemiologic research: The impact of the COVID-19 pandemic

Chunlei Tang, Joseph M. Plasek, Suhua Zhang, Yun Xiong, Yangyong Zhu, Jing Ma, L. Zhou, David W. Bates

Summary: Big data epidemiology provides data-driven insights for pandemic response, utilizing tools different from traditional methods. Addressing issues like insufficient data and data inaccessibility requires combining techniques across disciplines.

INTERNATIONAL JOURNAL FOR QUALITY IN HEALTH CARE (2021)

Article Multidisciplinary Sciences

COVID-19 Surveiller: toward a robust and effective pandemic surveillance system basedon social media mining

Jyun-Yu Jiang, Yichao Zhou, Xiusi Chen, Yan-Ru Jhou, Liqi Zhao, Sabrina Liu, Po-Chun Yang, Jule Ahmar, Wei Wang

Summary: This paper proposes a method to leverage social media users as social sensors, predicting pandemic trends while suggesting potential risk factors for public health experts. The method utilizes deep learning models to recognize important entities and their relations, establishing dynamic heterogeneous graphs to describe the observations of social media users. A web-based system is also developed to allow easy interaction for domain experts without computer science backgrounds.

PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES (2022)

Article Computer Science, Information Systems

Knowledge Source Rankings for Semi-Supervised Topic Modeling

Justin Wood, Corey Arnold, Wei Wang

Summary: Recent work suggests incorporating knowledge sources into the topic modeling process to improve topic discovery. However, existing semi-supervised topic models assume that the corpus contains topics on a subset of a domain, leading to slow inference when considering a large number of article-topics. This paper presents a ranking technique based on the PageRank algorithm to speed up the inference process and improve perplexity and interpretability. The results show significant improvements in various evaluation metrics compared to baseline methods.

INFORMATION (2022)

Article Health Care Sciences & Services

Improving Research Patient Data Repositories From a Health Data Industry Viewpoint

Chunlei Tang, Jing Ma, Li Zhou, Joseph Plasek, Yuqing He, Yun Xiong, Yangyong Zhu, Yajun Huang, David Bates

Summary: Organizational, administrative, and educational challenges hinder the efficient utilization of Research Patient Data Repositories (RPDRs) in biomedical data science infrastructures. This article explores applying data science thinking and practices from the business sector, known as the data industry viewpoint, to enhance RPDRs.

JOURNAL OF MEDICAL INTERNET RESEARCH (2022)

Article Computer Science, Information Systems

Learning from undercoded clinical records for automated International Classification of Diseases (ICD) coding

Yucheng Jin, Yun Xiong, Dan Shi, Yifei Lin, Lifang He, Yao Zhang, Joseph M. Plasek, Li Zhou, David W. Bates, Chunlei Tang

Summary: This study aims to develop an objective and unbiased method for learning automatic coding algorithms from clinical records with partial relevant codes. By using positive-unlabeled learning with reweighting and integrating supervision from an annotation tool, the performance of the algorithms is significantly improved, addressing the issues of annotation noise and imbalance.

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Discovering Undisclosed Paid Partnership on Social Media via Aspect-Attentive Sponsored Post Learning

Seungbae Kim, Jyun-Yu Jiang, Wei Wang

Summary: In this study, the SPoD model is proposed to detect undisclosed sponsorship in social media posts by learning various aspects of the posts. The experimental results demonstrate that SPoD significantly out-performs existing baseline methods in discovering sponsored posts on social media.

WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (2021)

Proceedings Paper Computer Science, Information Systems

The Biased Coin Flip Process for Nonparametric Topic Modeling

Justin Wood, Wei Wang, Corey Arnold

Summary: This paper introduces a new interpretation of nonparametric Bayesian learning called the biased coin flip process, proving its equivalence to the Dirichlet process and demonstrating improved predictive performance.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

Proceedings Paper Computer Science, Artificial Intelligence

MEDTO: Medical Data to Ontology Matching Using Hybrid Graph Neural Networks

Junheng Hao, Chuan Lei, Vasilis Efthymiou, Abdul Quamar, Fatma Ozcan, Yizhou Sun, Wei Wang

Summary: Medical ontologies and databases often have discrepancies that compromise interoperability, requiring data to ontology matching. Existing solutions focus on extracting information from ontologies for engineering, which can be labor-intensive. The proposed MEDTO framework utilizes three innovative techniques to achieve significant improvements in data to ontology matching.

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Coupled Graph ODE for Learning Interacting System Dynamics

Zijie Huang, Yizhou Sun, Wei Wang

Summary: Many real-world systems are dynamic in nature, where coupled objects interact through graphs and exhibit complex behavior over time. The COVID-19 pandemic can be seen as a dynamic system with geographical locations as objects, influencing each other's infection rates. There is a need to explore how to accurately model and predict the complex dynamics of these systems.

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Node Classification in Temporal Graphs Through Stochastic Sparsification and Temporal Structural Convolution

Cheng Zheng, Bo Zong, Wei Cheng, Dongjin Song, Jingchao Ni, Wenchao Yu, Haifeng Chen, Wei Wang

Summary: The proposed TSNet model jointly learns temporal and structural features for node classification from sparsified temporal graphs, effectively extracting local features and optimizing node representations to improve performance in node classification tasks.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III (2021)

Article Biology

Embedding, aligning and reconstructing clinical notes to explore sepsis

Xudong Zhu, Joseph M. Plasek, Chunlei Tang, Wasim Al-Assad, Zhikun Zhang, Yun Xiong, Liqin Wang, Sharmitha Yerneni, Carlos Ortega, Min-Jeoung Kang, Li Zhou, David W. Bates, Patricia C. Dykes

Summary: This study focuses on exploring and developing analysis tools for clinical notes, demonstrating how global embeddings, aligning at specific times, timeline reconstruction, and clustering can enhance representation learning and understanding of data connections in clinical documentation. The appropriate exploratory analysis tools not only improve data processing capabilities but also make data-driven medicine possible by providing keen insights into preprocessing clinical notes.

BMC RESEARCH NOTES (2021)

暂无数据