4.5 Article

Inference of population structure using genetic markers and a Bayesian model averaging approach for clustering

向作者/读者索取更多资源

The analysis of the structure of populations on the basis of genetic data is essential in population genetics. It is used, for instance, to study the evolution of species or to correct for population stratification in association studies. These genetic data, normally based on DNA polymorphisms, may contain irrelevant information that biases the inference of population structure. In this paper we adapt a recently proposed algorithm, named multi-start EMA, to be used in the inference of population structure. This algorithm is able to deal with irrelevant information when obtaining the (probabilistic) population partition. Additionally, we present a maker selection test able to obtain the most relevant markers to retrieve that population partition. The proposed algorithm is compared with the widely used STRUCTURE software on the basis of the F-ST metric and the log-likelihood score. It is shown that the proposed algorithm improves the obtention of the population structure. Moreover, information about relevant markers obtained by the multi-start EMA can be used to improve the results obtained by other methods, correct for population stratification or even also reduce the economical cost of sequencing new samples. The software presented in this paper is available online at http://www.sc.ehu.es/ccwbayes/members/guzman.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Information Systems

Semiparametric Bayesian networks

David Atienza, Concha Bielza, Pedro Larranaga

Summary: Semiparametric Bayesian networks combine parametric and nonparametric conditional probability distributions to incorporate the advantages of both components. By considering different types of conditional probability distributions and modifying learning algorithms, the proposed approach achieves comparable performance to state-of-the-art methods.

INFORMATION SCIENCES (2022)

Article Computer Science, Artificial Intelligence

Multipartition clustering of mixed data with Bayesian networks

Fernando Rodriguez-Sanchez, Concha Bielza, Pedro Larranaga

Summary: This paper introduces a multipartition clustering method for mixed data, which efficiently handles multifaceted data with several reasonable interpretations by utilizing Bayesian network factorization and the variational Bayes framework.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Piecewise forecasting of nonlinear time series with model tree dynamic Bayesian networks

David Quesada, Concha Bielza, Pedro Fontan, Pedro Larranaga

Summary: When modeling multivariate continuous time series, it is common to encounter nonlinear processes or drift away from the original distribution. To address this issue, we propose a hybrid model that combines a model tree with DBNs to obtain nonlinear forecasts. Experimental results demonstrate that our model outperforms standard DBN models when dealing with nonlinear processes and is competitive with state-of-the-art time series forecasting methods.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Time series classifier recommendation by a meta-learning approach

A. Abanda, U. Mori, Jose A. Lozano

Summary: This study investigates time series classifier recommendation for the first time, considering various recommendation forms or meta-targets. The researchers design a set of quick estimators as predictors for the recommendation system. Experimental results show that the proposed method outperforms other methods in most scenarios, and a hierarchical inference method for meta-targets is also proposed.

PATTERN RECOGNITION (2022)

Article Computer Science, Artificial Intelligence

EDA plus plus : Estimation of Distribution Algorithms With Feasibility Conserving Mechanisms for Constrained Continuous Optimization

Abolfazl Shirazi, Josu Ceberio, Jose A. Lozano

Summary: This article introduces a new algorithm (EDA++) equipped with mechanisms to handle nonlinear constraints by adopting the framework of estimation of distribution algorithms (EDAs). The study shows that the feasibility of the final solutions is guaranteed and the quality of the solutions in terms of objective values is improved by seeding an initial population of feasible solutions to the algorithm.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2022)

Article Computer Science, Information Systems

Asymmetric HMMs for Online Ball-Bearing Health Assessments

Carlos Puerto-Santana, Concha Bielza, Javier Diaz-Rozo, Guillem Ramirez-Gargallo, Filippo Mantovani, Gaizka Virumbrales, Jesus Labarta, Pedro Larranaga

Summary: This study introduces a methodology for health assessment based on online novelty detection and asymmetrical hidden Markov models for predicting the remaining useful life of ball bearings in industrial assets. The approach is designed to adapt to natural degradation of mechanical components and can be deployed in online environments. Performance analysis and validation with real datasets showcase the advantages of this methodology.

IEEE INTERNET OF THINGS JOURNAL (2022)

Article Computer Science, Artificial Intelligence

Bayesian Performance Analysis for Algorithm Ranking Comparison

Jairo Rojas-Delgado, Josu Ceberio, Borja Calvo, Jose A. Lozano

Summary: This work delves into the Bayesian statistical assessment of experimental results, proposing a framework for analyzing multiple algorithms on multiple problems/instances by transforming experimental results into rankings and estimating the posterior distribution of the parameters of probability models. Various inferences regarding algorithm rankings are examined, and a Python package and source code implementation are provided for other researchers to utilize.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2022)

Article Computer Science, Artificial Intelligence

Selective Imputation for Multivariate Time Series Datasets With Missing Values

Anehd Blazquez-Garcia, Kristoffer Wickstrom, Shujian Yu, Karl Oyvind Mikalsen, Ahcene Boubekki, Angel Conde, Usue Mori, Robert Jenssen, Jose A. Lozano

Summary: This paper proposes a selective imputation method for handling missing values in multivariate time series data. By using multi-objective optimization techniques, the method selects the time points to impute in order to reduce imputation uncertainty and accurately represent the original time series. Experimental results show that this method can improve the performance of downstream tasks while maintaining the quality of the imputations.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Article Computer Science, Interdisciplinary Applications

Learning the progression patterns of treatments using a probabilistic generative model

Onintze Zaballa, Aritz Perez, Elisa Gomez Inhiesto, Teresa Acaiturri Ayesta, Jose A. Lozano

Summary: This paper presents a probabilistic generative model for disease modeling and patient treatment based on Electronic Health Records. The model aims to identify different subtypes of treatments for a given disease and discover their development and progression. It considers the hierarchical structure of latent variables to classify and segment the treatment sequences. The model's learning procedure is efficiently solved with the Expectation-Maximization algorithm based on dynamic programming. The evaluation includes recovering the generative model underlying synthetic data and assessing the model's ability to provide treatment classification and staging information in real-world data. The model can be used for classification, simulation, data augmentation, and missing data imputation.

JOURNAL OF BIOMEDICAL INFORMATICS (2023)

Review Computer Science, Artificial Intelligence

Feature subset selection for data and feature streams: a review

Carlos Villa-Blanco, Concha Bielza, Pedro Larranaga

Summary: Real-world problems often have high feature dimensionality, making it difficult to model and analyze the data. Feature subset selection (FSS) techniques can be used to reduce irrelevant or redundant information, improving the speed and performance of building models. This review focuses on incremental FSS algorithms that can efficiently handle large volumes of data received sequentially. Different strategies, such as updating feature weights incrementally, applying information theory, or using rough set-based FSS, are discussed, along with various supervised and unsupervised learning tasks where FSS is applicable.

ARTIFICIAL INTELLIGENCE REVIEW (2023)

Article Computer Science, Artificial Intelligence

Minimum Recall-Based Loss Function for Imbalanced Time Series Classification

Josu Ircio, Aizea Lojo, Usue Mori, Simon Malinowski, Jose A. Lozano

Summary: This paper addresses imbalanced time series classification problems and proposes a method for learning time series classifiers that maximize the minimum recall rather than accuracy. By applying several smooth approximations of the minimum recall function, our approach improves the performance of state-of-the-art methods in imbalanced time series classification, with only a slight loss in accuracy.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

Proceedings Paper Computer Science, Artificial Intelligence

New knowledge about the Elementary Landscape Decomposition for solving the Quadratic Assignment Problem

Xabier Benavides, Josu Ceberio, Leticia Hernando, Jose A. Lozano

Summary: Previous works have shown that studying the characteristics of the Quadratic Assignment Problem (QAP) is crucial in designing tailored meta-heuristic algorithms. This study focuses on the Elementary Landscape Decomposition (ELD) method, which is widely used but lacks a clear understanding of its measurement components. To address this issue, this work further decomposes the ELD and conducts experiments to explain the behavior of ELD-based methods, providing critical information about their potential applications.

PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023 (2023)

Article Biochemical Research Methods

Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian networks

Niko Bernaola, Mario Michiels, Pedro Larranaga, Concha Bielza

Summary: We present FGES-Merge, a new method for learning the structure of gene regulatory networks by merging locally learned Bayesian networks using the fast greedy equivalent search algorithm. The method is competitive in terms of accuracy and speed, scaling up to large networks and incorporating empirical knowledge of gene regulatory network topology. We also introduce a visualization tool for exploring massive networks and identifying nodes of interest. Our work contributes to predicting gene interactions on a large scale and provides a valuable resource for future biological research.

PLOS COMPUTATIONAL BIOLOGY (2023)

Article Computer Science, Artificial Intelligence

Feature Saliencies in Asymmetric Hidden Markov Models

Carlos Puerto-Santana, Pedro Larranaga, Concha Bielza

Summary: This article introduces asymmetric hidden Markov models with feature saliencies, which are capable of simultaneously determining relevant variables/features and probabilistic relationships between variables during their learning phase. Comparing with other approaches, the proposed models have better or equal fitness and provide further data insights.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

An Online Feature Selection Methodology for Ball-Bearing Harmonic Frequencies Based on HMMs

Carlos Puerto-Santana, Pedro Larranaga, Javier Diaz-Rozo, Concha Bielza

Summary: This paper focuses on data streams produced by sensors in industrial environments and proposes an online feature subset selection methodology based on HMM to determine the relevant fundamental and harmonic frequencies during operation of ball-bearings.

16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021) (2022)

暂无数据