☆ 4.5 Article

Flexible decision tree for data stream classification in the presence of concept change, noise and missing values

DATA MINING AND KNOWLEDGE DISCOVERY (2009)

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Volume 19, Issue 1, Pages 95-131

Publisher

SPRINGER

DOI: 10.1007/s10618-009-0130-9

Keywords

Classification learning; Data stream classification; Decision tree learning; Fuzzy learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In recent years, classification learning for data streams has become an important and active research topic. A major challenge posed by data streams is that their underlying concepts can change over time, which requires current classifiers to be revised accordingly and timely. To detect concept change, a common methodology is to observe the online classification accuracy. If accuracy drops below some threshold value, a concept change is deemed to have taken place. An implicit assumption behind this methodology is that any drop in classification accuracy can be interpreted as a symptom of concept change. Unfortunately however, this assumption is often violated in the real world where data streams carry noise that can also introduce a significant reduction in classification accuracy. To compound this problem, traditional noise cleansing methods are incompetent for data streams. Those methods normally need to scan data multiple times whereas learning for data streams can only afford one-pass scan because of data's high speed and huge volume. Another open problem in data stream classification is how to deal with missing values. When new instances containing missing values arrive, how a learning model classifies them and how the learning model updates itself according to them is an issue whose solution is far from being explored. To solve these problems, this paper proposes a novel classification algorithm, flexible decision tree (FlexDT), which extends fuzzy logic to data stream classification. The advantages are three-fold. First, FlexDT offers a flexible structure to effectively and efficiently handle concept change. Second, FlexDT is robust to noise. Hence it can prevent noise from interfering with classification accuracy, and accuracy drop can be safely attributed to concept change. Third, it deals with missing values in an elegant way. Extensive evaluations are conducted to compare FlexDT with representative existing data stream classification algorithms using a large suite of data streams and various statistical tests. Experimental results suggest that FlexDT offers a significant benefit to data stream classification in real-world scenarios where concept change, noise and missing values coexist.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Binary multi-layer classifier

Huanze Zeng, Argon Chen

Summary: A simple multi-layer classifier (MLC) model with binary split is proposed in the study, which has been thoroughly tested with 40 datasets, showing that binary MLC models are easier to interpret and achieve significantly better performance compared to other models.

INFORMATION SCIENCES (2021)