☆ 4.7 Article

Markov blanket-based universal feature selection for classification and regression of mixed-type data

EXPERT SYSTEMS WITH APPLICATIONS (2020)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 158, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2020.113398

关键词

Markov blanket; Multivariate feature selection; Conditional independence test; Likelihood-ratio test; Classification; Regression

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

National Research Foundation of Korea (NRF) - Korea Government (MSIT) [2018R1C1B5086611, 2020R1C1C1011063]
Korea Institute for Advancement of Technology (KIAT) - Korea Government (MOTIE) [N0008691]
National Research Foundation of Korea [2018R1C1B5086611, 2020R1C1C1011063] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Feature selection has been successfully applied to improve the quality of data analysis in various expert and intelligent systems. However, because most real-world data nowadays come with mixed features, traditional feature selection approaches that are mainly designed to handle single-type data are not suitable for this situation. In addition, most of existing methods are only applicable to a specific problem, either classification or regression. Therefore, it is an urgent need to develop a universal feature selection method that can be applied to classification and regression with mixed-type data. In response to this, our paper presents a new feature selection method based on a Markov blanket (MB) called Mixed-MB. The key idea behind this is to embed a likelihood ratio-based generalized conditional independence test into an efficient MB search algorithm to find the minimal set of features to fully explain the target variable on mixed-type data. This new MB feature selection method eliminates the weakness of existing MB feature selection method that it only can handle single-type data, while maintaining its strengths such as theoretical soundness, simplicity, speed, and versatility. Experimental results on real-world data sets with mixed features demonstrate that the proposed method is effective for improving the accuracy of prediction models in both classification and regression. It is also shown to be able to yield more accurate results with fewer features than other methods. We believe that Mixed-MB will be widely used in expert and intelligent systems that utilize various data to create value since it can be applied to any type of data and problem. (C) 2020 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Information Theoretic Methods for Variable Selection-A Review

Jan Mielniczuk

Summary: This paper reviews the information theoretic tools and their application in feature selection, focusing on classification problems with discrete features. The authors discuss various ways of constructing counterparts to conditional mutual information and their properties and limitations. They propose a unified method based on truncation for the Mobius expansion of conditional mutual information. The paper also discusses the main approaches to feature selection using the introduced measures of conditional dependence, along with methods for assessing the quality of the obtained predictors, including recent results on asymptotic distributions of empirical criteria and advances in resampling.

ENTROPY (2022)