☆ 4.5 Article

Inference of population structure using genetic markers and a Bayesian model averaging approach for clustering

JOURNAL OF COMPUTATIONAL BIOLOGY (2008)

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY

卷 15, 期 2, 页码 207-220

出版社

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2007.0051

关键词

algorithms; learning; probability

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

向作者/读者索取更多资源

Protocol

Reagent

摘要

The analysis of the structure of populations on the basis of genetic data is essential in population genetics. It is used, for instance, to study the evolution of species or to correct for population stratification in association studies. These genetic data, normally based on DNA polymorphisms, may contain irrelevant information that biases the inference of population structure. In this paper we adapt a recently proposed algorithm, named multi-start EMA, to be used in the inference of population structure. This algorithm is able to deal with irrelevant information when obtaining the (probabilistic) population partition. Additionally, we present a maker selection test able to obtain the most relevant markers to retrieve that population partition. The proposed algorithm is compared with the widely used STRUCTURE software on the basis of the F-ST metric and the log-likelihood score. It is shown that the proposed algorithm improves the obtention of the population structure. Moreover, information about relevant markers obtained by the multi-start EMA can be used to improve the results obtained by other methods, correct for population stratification or even also reduce the economical cost of sequencing new samples. The software presented in this paper is available online at http://www.sc.ehu.es/ccwbayes/members/guzman.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Information Systems

An Ensemble Learning Algorithm Based on Density Peaks Clustering and Fitness for Imbalanced Data

Hui Xu, Qicheng Liu

Summary: This paper proposes an algorithm based on density peaks clustering and fitness to address the low classification accuracy of the minority class in imbalanced data. Experimental results show that the algorithm outperforms other algorithms.

IEEE ACCESS (2022)

添加到收藏夹

Article Computer Science, Theory & Methods

A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery

Ankit Srivastava, Sriram P. Chockalingam, Srinivas Aluru

Summary: This article presents a parallel framework for scaling Bayesian network structure learning algorithms to tens of thousands of variables. The framework parallelizes three different algorithms and is able to construct large-scale networks from real data sets in less than a minute on 1024 cores, achieving significant speedup and efficiency. The scalability of the framework is also demonstrated using simulated data sets.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

SafePredict: A Meta-Algorithm for Machine Learning That Uses Refusals to Guarantee Correctness

Mustafa A. Kocak, David Ramirez, Elza Erkip, Dennis E. Shasha

Summary: SafePredict is a novel meta-algorithm that works with any base prediction algorithm to guarantee a chosen correctness rate by allowing refusals. It does not rely on assumptions about data distribution or base predictor and adapts to changes in the base predictor's error rate without knowing when the changes occur.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A machine learning approach to predict the k-coverage probability of wireless multihop networks considering boundary and shadowing effects

Jaiprakash Nagar, Sanjay Kumar Chaturvedi, Sieteng Soh, Abhilash Singh

Summary: This study proposes a machine learning approach based on the generalized regression neural network (GRNN) to predict the k-coverage performance of wireless multihop networks (WMNs) placed in a rectangular region. The proposed approach achieves better prediction accuracy and lower computational time complexity compared to existing benchmark algorithms in both scenarios with and without boundary effects (BEs).

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Maximum Joint Probability With Multiple Representations for Clustering

Rui Zhang, Hongyuan Zhang, Xuelong Li

Summary: The article proposes a new clustering framework that aims to maximize the joint probability of data and parameters, and can use a prior distribution to measure the rationality of different representations.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Self-Supervised Learning by Estimating Twin Class Distribution

Feng Wang, Tao Kong, Rufeng Zhang, Huaping Liu, Hang Li

Summary: This paper presents Twist, a self-supervised representation learning method that classifies large-scale unlabeled datasets in an end-to-end manner. The authors use a siamese network with a softmax operation to generate twin class distributions for augmented images. By maximizing the mutual information between input images and output class predictions, Twist avoids collapsed solutions and achieves state-of-the-art performance on various tasks. On the semi-supervised classification task, Twist outperforms previous methods by 6.2% improvement in top-1 accuracy using 1% ImageNet labels with a ResNet-50 backbone. Codes and pre-trained models are available at https://github.com/bytedance/TWIST.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2023)

添加到收藏夹

Article Automation & Control Systems

The High Separation Probability Assumption for Semi-Supervised Learning

Gao Huang, Chaoqun Du

Summary: This paper proposes a novel assumption and algorithm for semi-supervised learning, which complements the common low-density separation assumption and solves the transductive label assignment problem. Experimental results show that the proposed algorithm achieves competitive performance on multiple datasets and is almost one order of magnitude faster than existing SSL approaches.

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2022)

添加到收藏夹

Article Automation & Control Systems

A Scalable Distributed Dynamical Systems Approach to Learn the Strongly Connected Components and Diameter of Networks

Emily A. A. Reed, Guilherme Ramos, Paul Bogdan, Sergio Pequito

Summary: In this article, a scalable distributed solution is proposed for finding strongly connected components (SCCs) and the diameter of a directed network. The solution leverages dynamical consensus-like protocols and has a time complexity of O(NDd(max) (in-degree)), where N is the number of vertices, D is the network diameter, and d(max) (in-degree) is the maximum in-degree. It is proven that the algorithm terminates in D + 2 iterations, allowing the retrieval of the finite network diameter. Exhaustive simulations demonstrate the outperformance of the proposed algorithm on various random networks.

IEEE TRANSACTIONS ON AUTOMATIC CONTROL (2023)

添加到收藏夹

Article Geochemistry & Geophysics

Cross-Domain Lithology Identification Using Active Learning and Source Reweighting

Ji Chang, Yu Kang, Zerui Li, Wei Xing Zheng, Wenjun Lv, De-Yong Feng

Summary: Cross-domain lithology identification is a challenging problem that aims to predict the lithology of an uninterpreted well using logging data from an interpreted well. In this study, we propose a novel framework that combines active learning and domain adaptation to address the issues of data distribution shift and expensive label acquisition. Experimental results demonstrate that our method effectively suppresses performance degradation caused by data distribution shift and requires fewer target label queries.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Cost-Sensitive Online Adaptive Kernel Learning for Large-Scale Imbalanced Classification

Yingying Chen, Zijie Hong, Xiaowei Yang

Summary: This article introduces a cost-sensitive online adaptive kernel learning algorithm to address large-scale imbalanced classification problems. It proposes a misclassification cost to balance the accuracy between the minority class and the majority class. Experimental results demonstrate that the algorithm significantly improves classification performance on most large-scale imbalanced datasets.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Hybrid Dynamic Contrast and Probability Distillation for Unsupervised Person Re-Id

De Cheng, Jingyu Zhou, Nannan Wang, Xinbo Gao

Summary: This paper introduces a hybrid dynamic cluster contrast and probability distillation algorithm for unsupervised person re-identification. The algorithm makes use of the self-supervised signals of both clustered and un-clustered instances, as well as informative and valuable training examples, for effective and robust training.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

添加到收藏夹

Article Water Resources

Investigation of cross-entropy-based streamflow forecasting through an efficient interpretable automated search process

K. L. Chong, Y. F. Huang, C. H. Koo, Mohsen Sherif, Ali Najah Ahmed, Ahmed El-Shafie

Summary: Streamflow forecasting is crucial in water resources management, and this paper explores the use of machine learning algorithms for two distinct streamflow forecasting problems. The study finds that categorical-based streamflow forecast outperforms regression-based forecast, and forest-based algorithms are superior for predicting high streamflow fluctuations with low-dimensional input. Furthermore, encoding streamflow time series as images for forecasting demands further analysis as different approaches yield varying results.

APPLIED WATER SCIENCE (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Mining Statistically Significant Communities From Weighted Networks

Zengyou He, Wenfang Chen, Xiaoqi Wei, Yan Liu

Summary: As one of the most important topics in data mining and network science, community detection problem has been extensively studied. However, determining the statistical significance of an individual community in a weighted network remains unsolved. In this study, a new method is proposed to calculate the analytical p-value of an individual community in weighted networks, and it is utilized as the objective function in a local search procedure to derive a new community detection algorithm. Experimental results demonstrate that the new algorithm achieves comparable performance to state-of-the-art algorithms for identifying communities in weighted networks.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

添加到收藏夹

Article Computer Science, Information Systems

A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects

Ibomoiye Domor Mienye, Yanxia Sun

Summary: Ensemble learning techniques have achieved state-of-the-art performance by combining predictions from multiple base models, with a focus on widely used algorithms such as random forest, AdaBoost, gradient boosting, XGBoost, LightGBM, and CatBoost. This overview aims to provide concise coverage of their mathematical and algorithmic representations, lacking in existing literature, for the benefit of machine learning researchers and practitioners.

IEEE ACCESS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Arbitrary Shape Text Detection via Segmentation With Probability Maps

Shi-Xue Zhang, Xiaobin Zhu, Lei Chen, Jie-Bo Hou, Xu-Cheng Yin

Summary: Arbitrary shape text detection is a challenging task, but segmentation-based methods using probability maps show promising results in accurately detecting text instances. This paper proposes an innovative and robust segmentation-based detection method that uses Sigmoid Alpha Functions to transfer distances into probability maps, and a group of probability maps to cover complex probability distributions. The method achieves state-of-the-art performance in terms of detection accuracy on several benchmarks, including multi-oriented and multilingual datasets.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

添加到收藏夹

Article Computer Science, Information Systems

Semiparametric Bayesian networks

David Atienza, Concha Bielza, Pedro Larranaga

Summary: Semiparametric Bayesian networks combine parametric and nonparametric conditional probability distributions to incorporate the advantages of both components. By considering different types of conditional probability distributions and modifying learning algorithms, the proposed approach achieves comparable performance to state-of-the-art methods.

INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Multipartition clustering of mixed data with Bayesian networks

Fernando Rodriguez-Sanchez, Concha Bielza, Pedro Larranaga

Summary: This paper introduces a multipartition clustering method for mixed data, which efficiently handles multifaceted data with several reasonable interpretations by utilizing Bayesian network factorization and the variational Bayes framework.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Piecewise forecasting of nonlinear time series with model tree dynamic Bayesian networks

David Quesada, Concha Bielza, Pedro Fontan, Pedro Larranaga

Summary: When modeling multivariate continuous time series, it is common to encounter nonlinear processes or drift away from the original distribution. To address this issue, we propose a hybrid model that combines a model tree with DBNs to obtain nonlinear forecasts. Experimental results demonstrate that our model outperforms standard DBN models when dealing with nonlinear processes and is competitive with state-of-the-art time series forecasting methods.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Time series classifier recommendation by a meta-learning approach

A. Abanda, U. Mori, Jose A. Lozano

Summary: This study investigates time series classifier recommendation for the first time, considering various recommendation forms or meta-targets. The researchers design a set of quick estimators as predictors for the recommendation system. Experimental results show that the proposed method outperforms other methods in most scenarios, and a hierarchical inference method for meta-targets is also proposed.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

EDA plus plus : Estimation of Distribution Algorithms With Feasibility Conserving Mechanisms for Constrained Continuous Optimization

Abolfazl Shirazi, Josu Ceberio, Jose A. Lozano

Summary: This article introduces a new algorithm (EDA++) equipped with mechanisms to handle nonlinear constraints by adopting the framework of estimation of distribution algorithms (EDAs). The study shows that the feasibility of the final solutions is guaranteed and the quality of the solutions in terms of objective values is improved by seeding an initial population of feasible solutions to the algorithm.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2022)

添加到收藏夹

Article Computer Science, Information Systems

Asymmetric HMMs for Online Ball-Bearing Health Assessments

Carlos Puerto-Santana, Concha Bielza, Javier Diaz-Rozo, Guillem Ramirez-Gargallo, Filippo Mantovani, Gaizka Virumbrales, Jesus Labarta, Pedro Larranaga

Summary: This study introduces a methodology for health assessment based on online novelty detection and asymmetrical hidden Markov models for predicting the remaining useful life of ball bearings in industrial assets. The approach is designed to adapt to natural degradation of mechanical components and can be deployed in online environments. Performance analysis and validation with real datasets showcase the advantages of this methodology.

IEEE INTERNET OF THINGS JOURNAL (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Bayesian Performance Analysis for Algorithm Ranking Comparison

Jairo Rojas-Delgado, Josu Ceberio, Borja Calvo, Jose A. Lozano

Summary: This work delves into the Bayesian statistical assessment of experimental results, proposing a framework for analyzing multiple algorithms on multiple problems/instances by transforming experimental results into rankings and estimating the posterior distribution of the parameters of probability models. Various inferences regarding algorithm rankings are examined, and a Python package and source code implementation are provided for other researchers to utilize.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Selective Imputation for Multivariate Time Series Datasets With Missing Values

Anehd Blazquez-Garcia, Kristoffer Wickstrom, Shujian Yu, Karl Oyvind Mikalsen, Ahcene Boubekki, Angel Conde, Usue Mori, Robert Jenssen, Jose A. Lozano

Summary: This paper proposes a selective imputation method for handling missing values in multivariate time series data. By using multi-objective optimization techniques, the method selects the time points to impute in order to reduce imputation uncertainty and accurately represent the original time series. Experimental results show that this method can improve the performance of downstream tasks while maintaining the quality of the imputations.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

添加到收藏夹

Article Computer Science, Interdisciplinary Applications

Learning the progression patterns of treatments using a probabilistic generative model

Onintze Zaballa, Aritz Perez, Elisa Gomez Inhiesto, Teresa Acaiturri Ayesta, Jose A. Lozano

Summary: This paper presents a probabilistic generative model for disease modeling and patient treatment based on Electronic Health Records. The model aims to identify different subtypes of treatments for a given disease and discover their development and progression. It considers the hierarchical structure of latent variables to classify and segment the treatment sequences. The model's learning procedure is efficiently solved with the Expectation-Maximization algorithm based on dynamic programming. The evaluation includes recovering the generative model underlying synthetic data and assessing the model's ability to provide treatment classification and staging information in real-world data. The model can be used for classification, simulation, data augmentation, and missing data imputation.

JOURNAL OF BIOMEDICAL INFORMATICS (2023)

添加到收藏夹

Review Computer Science, Artificial Intelligence

Feature subset selection for data and feature streams: a review

Carlos Villa-Blanco, Concha Bielza, Pedro Larranaga

Summary: Real-world problems often have high feature dimensionality, making it difficult to model and analyze the data. Feature subset selection (FSS) techniques can be used to reduce irrelevant or redundant information, improving the speed and performance of building models. This review focuses on incremental FSS algorithms that can efficiently handle large volumes of data received sequentially. Different strategies, such as updating feature weights incrementally, applying information theory, or using rough set-based FSS, are discussed, along with various supervised and unsupervised learning tasks where FSS is applicable.

ARTIFICIAL INTELLIGENCE REVIEW (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Minimum Recall-Based Loss Function for Imbalanced Time Series Classification

Josu Ircio, Aizea Lojo, Usue Mori, Simon Malinowski, Jose A. Lozano

Summary: This paper addresses imbalanced time series classification problems and proposes a method for learning time series classifiers that maximize the minimum recall rather than accuracy. By applying several smooth approximations of the minimum recall function, our approach improves the performance of state-of-the-art methods in imbalanced time series classification, with only a slight loss in accuracy.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2023)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

New knowledge about the Elementary Landscape Decomposition for solving the Quadratic Assignment Problem

Xabier Benavides, Josu Ceberio, Leticia Hernando, Jose A. Lozano

Summary: Previous works have shown that studying the characteristics of the Quadratic Assignment Problem (QAP) is crucial in designing tailored meta-heuristic algorithms. This study focuses on the Elementary Landscape Decomposition (ELD) method, which is widely used but lacks a clear understanding of its measurement components. To address this issue, this work further decomposes the ELD and conducts experiments to explain the behavior of ELD-based methods, providing critical information about their potential applications.

PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023 (2023)

添加到收藏夹

Article Biochemical Research Methods

Learning massive interpretable gene regulatory networks of the human brain by merging Bayesian networks

Niko Bernaola, Mario Michiels, Pedro Larranaga, Concha Bielza

Summary: We present FGES-Merge, a new method for learning the structure of gene regulatory networks by merging locally learned Bayesian networks using the fast greedy equivalent search algorithm. The method is competitive in terms of accuracy and speed, scaling up to large networks and incorporating empirical knowledge of gene regulatory network topology. We also introduce a visualization tool for exploring massive networks and identifying nodes of interest. Our work contributes to predicting gene interactions on a large scale and provides a valuable resource for future biological research.

PLOS COMPUTATIONAL BIOLOGY (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Feature Saliencies in Asymmetric Hidden Markov Models

Carlos Puerto-Santana, Pedro Larranaga, Concha Bielza

Summary: This article introduces asymmetric hidden Markov models with feature saliencies, which are capable of simultaneously determining relevant variables/features and probabilistic relationships between variables during their learning phase. Comparing with other approaches, the proposed models have better or equal fitness and provide further data insights.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

An Online Feature Selection Methodology for Ball-Bearing Harmonic Frequencies Based on HMMs

Carlos Puerto-Santana, Pedro Larranaga, Javier Diaz-Rozo, Concha Bielza

Summary: This paper focuses on data streams produced by sensors in industrial environments and proposes an online feature subset selection methodology based on HMM to determine the relevant fundamental and harmonic frequencies during operation of ball-bearings.

16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021) (2022)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.