4.6 Article Proceedings Paper

Empirical evaluation of scoring functions for Bayesian network model selection

Journal

BMC BIOINFORMATICS
Volume 13, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/1471-2105-13-S15-S14

Keywords

-

Funding

  1. Direct For Computer & Info Scie & Enginr
  2. Div Of Information & Intelligent Systems [0953723] Funding Source: National Science Foundation

Ask authors/readers for more resources

In this work, we empirically evaluate the capability of various scoring functions of Bayesian networks for recovering true underlying structures. Similar investigations have been carried out before, but they typically relied on approximate learning algorithms to learn the network structures. The suboptimal structures found by the approximation methods have unknown quality and may affect the reliability of their conclusions. Our study uses an optimal algorithm to learn Bayesian network structures from datasets generated from a set of gold standard Bayesian networks. Because all optimal algorithms always learn equivalent networks, this ensures that only the choice of scoring function affects the learned networks. Another shortcoming of the previous studies stems from their use of random synthetic networks as test cases. There is no guarantee that these networks reflect real-world data. We use real-world data to generate our gold-standard structures, so our experimental design more closely approximates real-world situations. A major finding of our study suggests that, in contrast to results reported by several prior works, the Minimum Description Length (MDL) (or equivalently, Bayesian information criterion (BIC)) consistently outperforms other scoring functions such as Akaike's information criterion (AIC), Bayesian Dirichlet equivalence score (BDeu), and factorized normalized maximum likelihood (fNML) in recovering the underlying Bayesian network structures. We believe this finding is a result of using both datasets generated from real-world applications rather than from random processes used in previous studies and learning algorithms to select high-scoring structures rather than selecting random models. Other findings of our study support existing work, e. g., large sample sizes result in learning structures closer to the true underlying structure; the BDeu score is sensitive to the parameter settings; and the fNML performs pretty well on small datasets. We also tested a greedy hill climbing algorithm and observed similar results as the optimal algorithm.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Genetics & Heredity

SVSI: Fast and Powerful Set-Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Ordered Categorical Traits

Wenjian Bi, Guolian Kang, Yanlong Zhao, Yuehua Cui, Song Yan, Yun Li, Cheng Cheng, Stanley B. Pounds, Michael J. Borowitz, Mary V. Relling, Jun J. Yang, Zhifa Liu, Ching-Hon Pui, Stephen P. Hunger, Christine M. Hartford, Wing Leung, Ji-Feng Zhang

ANNALS OF HUMAN GENETICS (2015)

Meeting Abstract Biochemical Research Methods

Dunn Index Bootstrap (DIBS): A procedure to empirically select a cluster analysis method that identifies biologically and clinically relevant molecular disease subgroups

Iwona Pawlikowska, Zhifa Liu, Lei Shi, Tong Lin, Tanja Gruber, Giles Robinson, Arzu Onar-Thomas, Stan Pounds

BMC BIOINFORMATICS (2015)

Article Dermatology

The Genomic Landscape of Childhood and Adolescent Melanoma

Charles Lu, Jinghui Zhang, Panduka Nagahawatte, John Easton, Seungjae Lee, Zhifa Liu, Li Ding, Matthew A. Wyczalkowski, Marcus Valentine, Fariba Navid, Heather Mulder, Ruth G. Tatevossian, James Dalton, James Davenport, Zhirong Yin, Michael Edmonson, Michael Rusch, Gang Wu, Yongjin Li, Matthew Parker, Erin Hedlund, Sheila Shurtleff, Susana Raimondi, Vadodaria Bhavin, Yergeau Donald, Elaine R. Mardis, Richard K. Wilson, William E. Evans, David W. Ellison, Stanley Pounds, Michael Dyer, James R. Downing, Alberto Pappo, Armita Bahrami

JOURNAL OF INVESTIGATIVE DERMATOLOGY (2015)

Article Multidisciplinary Sciences

Genomic landscape of paediatric adrenocortical tumours

Emilia M. Pinto, Xiang Chen, John Easton, David Finkelstein, Zhifa Liu, Stanley Pounds, Carlos Rodriguez-Galindo, Troy C. Lund, Elaine R. Mardis, Richard K. Wilson, Kristy Boggs, Donald Yergeau, Jinjun Cheng, Heather L. Mulder, Jayanthi Manne, Jesse Jenkins, Maria J. Mastellaro, Bonald C. Figueiredo, Michael A. Dyer, Alberto Pappo, Jinghui Zhang, James R. Downing, Raul C. Ribeiro, Gerard P. Zambetti

NATURE COMMUNICATIONS (2015)

Article Oncology

Prognostic Significance of Major Histocompatibility Complex Class II Expression in Pediatric Adrenocortical Tumors: A St. Jude and Children's Oncology Group Study

Emilia Modolo Pinto, Carlos Rodriguez-Galindo, John Kim Choi, Stanley Pounds, Zhifa Liu, Geoffrey Neale, David Finkelstein, John M. Hicks, Alberto S. Pappo, Bonald C. Figueiredo, Raul C. Ribeiro, Gerard P. Zambetti

CLINICAL CANCER RESEARCH (2016)

Article Biochemical Research Methods

A genomic random interval model for statistical analysis of genomic lesion data

Stan Pounds, Cheng Cheng, Shaoyu Li, Zhifa Liu, Jinghui Zhang, Charles Mullighan

BIOINFORMATICS (2013)

Article Biochemical Research Methods

The most informative spacing test effectively discovers biologically relevant outliers or multiple modes in expression

Iwona Pawlikowska, Gang Wu, Michael Edmonson, Zhifa Liu, Tanja Gruber, Jinghui Zhang, Stan Pounds

BIOINFORMATICS (2014)

Meeting Abstract Biochemical Research Methods

A powerful association for comorbidity analysis based on score based test

Zhifa Liu

BMC BIOINFORMATICS (2013)

Meeting Abstract Biochemical Research Methods

Our strategy to achieve and document reproducible computing

Nisrine Enyinda, Zhifa Liu, Areg Negatu, Stan Pounds

BMC BIOINFORMATICS (2013)

Meeting Abstract Biochemical Research Methods

Feature selection and prediction with a Markov blanket structure learning algorithm

Yuan Tan, Zhifa Liu

BMC BIOINFORMATICS (2013)

Article Biochemical Research Methods

An R package that automatically collects and archives details for reproducible computing

Zhifa Liu, Stan Pounds

BMC BIOINFORMATICS (2014)

Article Genetics & Heredity

Genetic Association Test for Multiple Traits at Gene Level

Xiaobo Guo, Zhifa Liu, Xueqin Wang, Heping Zhang

GENETIC EPIDEMIOLOGY (2013)

Article Multidisciplinary Sciences

NCK2 Is Significantly Associated with Opiates Addiction in African-Origin Men

Zhifa Liu, Xiaobo Guo, Yuan Jiang, Heping Zhang

SCIENTIFIC WORLD JOURNAL (2013)

No Data Available