☆ 4.5 Article

Improving random forests by neighborhood projection for effective text classification

INFORMATION SYSTEMS (2018)

期刊

INFORMATION SYSTEMS

卷 77, 期 -, 页码 1-21

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.is.2018.05.006

关键词

Classification; Random forests; Lazy learning; Nearest neighbors

类别

Computer Science, Information Systems

资金

CNPq
CAPES
FINEP
FAPEMIG
INWEB

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this article, we propose a lazy version of the traditional random forest (RF) classifier (called LazyNN_RF), specially designed for highly dimensional noisy classification tasks. The LazyNN_RF localized training projection is composed by examples that better resemble the examples to be classified, obtained through nearest neighborhood training set projection. Such projection filters out irrelevant data, ultimately avoiding some of the drawbacks of traditional random forests, such as overfitting due to very complex trees, especially in high dimensional noisy datasets. In sum, our main contributions are: (i) the proposal and implementation of a novel lazy learner based on the random forest classifier and nearest neighborhood projection of the training set that excels in automatic text classification tasks, as well as (ii) a throughout and detailed experimental analysis that sheds light on the behavior, effectiveness and feasibility of our solution. By means of an extensive experimental evaluation, performed considering two text classification domains and a large set of baseline algorithms, we show that our approach is highly effective and feasible, being a strong candidate for consideration for solving automatic text classification tasks when compared to state-of-the-art classifiers. (C) 2018 Elsevier Ltd. All rights reserved.

Improving random forests by neighborhood projection for effective text classification

期刊

INFORMATION SYSTEMS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improving random forests by neighborhood projection for effective text classification

期刊

INFORMATION SYSTEMS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文