☆ 4.7 Article

Enhancing Binary Classification by Modeling Uncertain Boundary in Three-Way Decisions

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2017)

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

卷 29, 期 7, 页码 1438-1451

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2017.2681671

关键词

Uncertain decision boundary; text classification; three-way decision; rough set; decision rule

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic

资金

Australian Research Council [DP140103157]
RGC Hong Kong [CityU 11502115]
Basic Research Program from Shenzhen Municipal RD Funding [JCYJ20160229165300897]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Text classification is a process of classifying documents into predefined categories through different classifiers learned from labelled or unlabelled training samples. Many researchers who work on binary text classification attempt to find a more effective way to separate relevant texts from a large data set. However, current text classifiers cannot unambiguously describe the decision boundary between positive and negative objects because of uncertainties caused by text feature selection and the knowledge learning process. This paper proposes a three-way decision model for dealing with the uncertain boundary to improve the binary text classification performance based on the rough set techniques and centroid solution. It aims to understand the uncertain boundary through partitioning the training samples into three regions ( the positive, boundary, and negative regions) by two main boundary vectors (C) over right arrow (P) and (C) over right arrow (N), created from the labeled positive and negative training subsets, respectively, and further resolve the objects in the boundary region by two derived boundary vectors (B) over right arrow (P) and (B) over right arrow (N), produced according to the structure of the boundary region. It involves an indirect strategy which is composed of two successive steps in the whole classification process: 'two-way to three-way' and 'three-way to two-way'. Four decision rules are proposed from the training process and applied to the incoming documents for more precise classification. A large number of experiments have been conducted based on the standard data sets RCV1 and Reuters-21578. The experimental results show that the usage of boundary vectors is very effective and efficient for dealing with uncertainties of the decision boundary, and the proposed model has significantly improved the performance of binary text classification in terms of F-1 measure and AUC area compared with six other popular baseline models.

Enhancing Binary Classification by Modeling Uncertain Boundary in Three-Way Decisions

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Enhancing Binary Classification by Modeling Uncertain Boundary in Three-Way Decisions

期刊

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文