☆ 4.4 Editorial Material

Classifying text streams by keywords using classifier ensemble

DATA & KNOWLEDGE ENGINEERING (2011)

Journal

DATA & KNOWLEDGE ENGINEERING

Volume 70, Issue 9, Pages 775-793

Publisher

ELSEVIER

DOI: 10.1016/j.datak.2011.05.002

Keywords

Text stream classification; Concept drift; Classifier ensemble; Knowledge acquisition

Categories

Computer Science, Artificial Intelligence Computer Science, Information Systems

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Traditional approaches for text data stream classification usually require the manual labeling of a number of documents, which is an expensive and time consuming process. In this paper, to overcome this limitation, we propose to classify text streams by keywords without labeled documents so as to reduce the burden of labeling manually. We build our base text classifiers with the help of keywords and unlabeled documents to classify text streams, and utilize classifier ensemble algorithms to cope with concept drifting in text data streams. Experimental results demonstrate that the proposed method can build good classifiers by keywords without manual labeling, and when the ensemble based algorithm is used, the concept drift in the streams can be well detected and adapted, which performs better than the single window algorithm. (c) 2011 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Information Systems

Active Weighted Aging Ensemble for drifted data stream classification

Michal Wozniak, Pawel Zyblewski, Pawel Ksieniewicz

Summary: Concept drift is a significant problem in data stream classification, causing performance degradation. This paper proposes a novel algorithm, AWAE, which utilizes ensemble learning and active learning to address concept drift effectively. Experimental results demonstrate its high quality compared to state-of-the-art methods.

INFORMATION SCIENCES (2023)

Add to Collection

Article Computer Science, Information Systems

Dynamically Adjusting Diversity in Ensembles for the Classification of Data Streams with Concept Drift

Juan I. G. Hidalgo, Silas G. T. C. Santos, Roberto S. M. Barros

Summary: This study proposes a dynamic parameter adjustment strategy for dealing with concept drifts in data streams. Experimental results show that the dynamic estimation of the diversity parameter (lambda) produces good results in various scenarios, improving accuracy in both artificial and real-world datasets.

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (2022)

Add to Collection

Article Computer Science, Information Systems

A hybrid block-based ensemble framework for the multi-class problem to react to different types of drifts

Osama A. Mahdi, Eric Pardede, Nawfal Ali

Summary: Data stream mining is an important research topic with increasing attention in various applications. Challenges of concept drift and multiple classes in data streams have motivated the proposal of a hybrid block-based ensemble approach, which outperforms other algorithms in experimental evaluations.

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Deterministic Sampling Classifier with weighted Bagging for drifted imbalanced data stream classification

Jakub Klikowski, Michal Wozniak

Summary: Streaming data classification is a critical task that deals with concept drift and imbalanced data. This paper proposes a novel algorithm that utilizes data preprocessing and weighted bagging technique to address these challenges, and experimental results demonstrate its effectiveness in various scenarios.

APPLIED SOFT COMPUTING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Semi-supervised classification on data streams with recurring concept drift and concept evolution

Xiulin Zheng, Peipei Li, Xuegang Hu, Kui Yu

Summary: Mining non-stationary streams poses challenges due to their infinite length, dynamic characteristics, concept drift, concept evolution, and limited labeled data. Existing supervised methods may result in poor performance and efficiency in the presence of scarce labeled data. This paper proposes a semi-supervised framework ESCR to detect recurring concept drifts and concept evolution in data streams with partially labeled data. The framework utilizes clustering-based classifiers, Jensen-Shannon divergence for change detection, and outlier monitoring for concept evolution, while also improving efficiency through recursive function and dynamic programming. Extensive experiments show the effectiveness and efficiency of ESCR compared to other semi-supervised methods.

KNOWLEDGE-BASED SYSTEMS (2021)

Add to Collection

Article Computer Science, Information Systems

Concept drift detection with quadtree-based spatial mapping of streaming data

Rodrigo Amador Coelho, Luiz Carlos Bambirra Torres, Cristiano Leite de Castro

Summary: Online learning faces challenges in monitoring and detecting changes in data distribution over time, which affect the performance of the learning algorithm. This study proposes a novel detection method that analyzes the occupied space by the data and detects drifts by checking the relevance of data assigned to different classes. The evaluation on benchmark problems demonstrates that our method competes effectively with existing drift detectors on synthetic and real-world datasets.

INFORMATION SCIENCES (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection

Elif Selen Baburoglu, Alptekin Durmusoglu, Turkay Dereli

Summary: This study focuses on addressing concept drift during online learning through a large-scale comparison of drift detectors and classifiers to determine the most efficient matched pairs for improving model accuracy. The results indicate that the most effective pairs primarily include the HDDMA, RDDM, WSTD, and FHDDM detectors, which vary depending on the dataset type and size.

EXPERT SYSTEMS WITH APPLICATIONS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Analyzing and repairing concept drift adaptation in data stream classification

Ben Halstead, Yun Sing Koh, Patricia Riddle, Russel Pears, Mykola Pechenizkiy, Albert Bifet, Gustavo Olivares, Guy Coulson

Summary: Data collected over time may show changes in distribution, and data stream based methods can effectively detect concept drift. However, existing methods may not robustly handle real-world tasks, leading to adaptation errors.

MACHINE LEARNING (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Elastic gradient boosting decision tree with adaptive iterations for concept drift adaptation

Kun Wang, Jie Lu, Anjin Liu, Yiliao Song, Li Xiong, Guangquan Zhang

Summary: This paper proposes a novel adaptive iterations (AdIter) method that automatically selects the number of iterations based on the severity of concept drift, in order to improve the prediction accuracy of data streams under concept drift.

NEUROCOMPUTING (2022)

Add to Collection

Article Biochemical Research Methods

Concept drift detection in toxicology datasets using discriminative subgraph-based drift detector

Vandana Bharti, Shabari S. Nair, Akshat Jain, Kaushal Kumar Shukla, Bhaskar Biswas

Summary: This paper addresses the issue of concept drift detection in graph streams, specifically in the field of toxicology. The author applies the discriminative subgraph-based drift detector (DSDD) to real-world toxicology datasets and compares its performance with different drift detection methods. The results and analysis provide insights into concept drift detection in the toxicology domain and aid in the application of DSDD in various scenarios.

BRIEFINGS IN BIOINFORMATICS (2023)

Add to Collection

Article Business

Concept-drift detection index based on fuzzy formal concept analysis for fake news classifiers

Giuseppe Fenza, Mariacristina Gallo, Vincenzo Loia, Alessandra Petrone, Claudio Stanzione

Summary: Concept drift refers to the unpredictable changes in the underlying distribution of streaming data over time. Detecting, interpreting, and adapting to concept drift is crucial in concept drift research. It is found that machine learning in a concept drift environment produces poor results without handling drift. This study proposes a concept drift detection index based on Fuzzy Formal Concept Analysis theory to predict when the performance of a machine learning model for text-stream classifiers is low. Experimental results show a significant correlation between the proposed index and the accuracy of Random Forest, Naive Bayes, and Passive Aggressive models, suggesting that the index can prevent incorrect classifications and aid in retraining decisions.

TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Unsupervised concept drift detection for multi-label data streams

Ege Berkay Gulcan, Fazli Can

Summary: Many real-world applications adopt multi-label data streams as the need for algorithms to deal with rapidly changing data increases. We propose a novel algorithm called Label Dependency Drift Detector (LD3) for unsupervised concept drift detection in multi-label data streams. Our study shows that LD3 provides better predictive performance than other detectors on both real-world and synthetic data streams.

ARTIFICIAL INTELLIGENCE REVIEW (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Two-level pruning based ensemble with abstained learners for concept drift in data streams

Kanu Goel, Shalini Batra

Summary: The paper introduces a novel concept drift handling approach named TLP-EnAbLe, which maintains suitable learners for current concept by adding diversity-based pruning to traditional accuracy-based pruning. This approach effectively handles concept drift by deferring similarity-based pruning and monitoring the performance of learners in real time.

EXPERT SYSTEMS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Diverse Instance-Weighting Ensemble Based on Region Drift Disagreement for Concept Drift Adaptation

Anjin Liu, Jie Lu, Guangquan Zhang

Summary: This study proposes a method for handling concept drift based on measuring diversity by determining the extent to which ensemble members agree on regional distribution changes. Different sets of regions are constructed to maximize diversity, and an instance-based ensemble learning algorithm called DiwE is developed for data stream classification problems. Evaluation results on various synthetic and real-world data stream benchmarks demonstrate the effectiveness and advantages of the proposed algorithm.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2021)

Add to Collection

Article Computer Science, Information Systems

DME: An Adaptive and Just-in-Time Weighted Ensemble Learning Method for Classifying Block-Based Concept Drift Steam

Baoquan Feng, Yan Gu, Hualong Yu, Xibei Yang, Shang Gao

Summary: This study proposes a novel incremental learning algorithm called distribution matching ensemble (DME) for adaptive weighted ensemble learning. DME estimates the distribution of each data block and maintains a group of classifiers in a buffer. When a new data block is received, the similarity between its distribution and reserved distributions is calculated to guide weight assignment for adaptive ensemble decision. Experiments show that DME can track and adapt to various types of concept drift, outperforming state-of-the-art algorithms.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Temporal tree representation for similarity computation between medical patients

Suresh Pokharel, Guido Zuccon, Xue Li, Chandra Prasetyo Utomo, Yu Li

ARTIFICIAL INTELLIGENCE IN MEDICINE (2020)

Add to Collection

Editorial Material Computer Science, Information Systems

Editorial for application-driven knowledge acquisition

Xue Li, Sen Wang, Bohan Li

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS (2020)

Add to Collection

Article Computer Science, Artificial Intelligence

Source data-free domain adaptation of object detector through domain-specific perturbation

Lin Xiong, Mao Ye, Dan Zhang, Yan Gan, Xue Li, Yingying Zhu

Summary: In this study, a source data-free domain adaptation method called SOAP is proposed, which eliminates domain perturbation from the target domain using noise perturbation method and learns the correct alignment direction through image-level, instance-level, and category consistency regularizations based on the Mean Teacher structure. Experiments show that SOAP achieves better performance in multiple domain adaptation scenarios compared to baseline and other domain adaptation methods.

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Suicidal ideation and mental disorder detection with attentive relation networks

Shaoxiong Ji, Xue Li, Zi Huang, Erik Cambria

Summary: Mental health is a critical issue in modern society, and early detection of mental disorders and suicidal ideation from social content is a potential way for effective social intervention. However, classifying suicidal ideation and other mental disorders is challenging due to their similar patterns in language usage and sentimental polarity.

NEURAL COMPUTING & APPLICATIONS (2022)

Add to Collection

Article Engineering, Electrical & Electronic

Coarse-to-Fine Spatio-Temporal Information Fusion for Compressed Video Quality Enhancement

Dengyan Luo, Mao Ye, Shuai Li, Xue Li

Summary: In this paper, a new network called CF-STIF is proposed for compressed video quality enhancement by predicting better offsets. It utilizes 3D convolution and multi-scale strategy to increase the receptive field, thus efficiently aggregating information from neighboring frames. Experimental results show that CF-STIF outperforms state-of-the-art approaches.

IEEE SIGNAL PROCESSING LETTERS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Dynamic Sampling and Selective Masking for Communication-Efficient Federated Learning

Shaoxiong Ji, Wenqi Jiang, Anwar Walid, Xue Li

Summary: This article introduces the concept and challenges of federated learning, as well as two methods for improving communication efficiency: dynamic sampling and top-k selective masking.

IEEE INTELLIGENT SYSTEMS (2022)

Add to Collection

Article Engineering, Electrical & Electronic

OVQE: Omniscient Network for Compressed Video Quality Enhancement

Liuhan Peng, Askar Hamdulla, Mao Ye, Shuai Li, Zengbin Wang, Xue Li

Summary: This paper proposes an omniscient network that learns video spatiotemporal and omni-frequency information more effectively. It includes a Spatio-Temporal Feature Fusion (STFF) module and an Omni-Frequency Adaptive Enhancement (OFAE) block to restore texture details of compressed videos.

IEEE TRANSACTIONS ON BROADCASTING (2023)

Add to Collection

Article Computer Science, Hardware & Architecture

Preserving Privacy for Distributed Genome-Wide Analysis Against Identity Tracing Attacks

Yanjun Zhang, Guangdong Bai, Xue Li, Surya Nepal, Marthie Grobler, Chen Chen, Ryan K. L. Ko

Summary: Genome-wide analysis has health and social benefits, but sharing such data may risk revealing sensitive information. Identity tracing attack exploits correlations among genomic data to reveal the identity of DNA samples. This paper proposes a framework called "F-RAG" to enable privacy-preserving data sharing and computation in genome-wide analysis, mitigating privacy risks without compromising utility.

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING (2023)

Add to Collection

Article Computer Science, Information Systems

Dual-core mutual learning between scoring systems and clinical features for ICU mortality prediction

Zhenkun Shi, Sen Wang, Lin Yue, Yijia Zhang, Binod Kumar Adhikari, Shuai Xue, Wanli Zuo, Xue Li

Summary: Perpetually improving mortality prediction in intensive care units (ICUs) through eHealth evaluation approaches has become a major research focus in medical data mining, with the goal of saving lives. However, existing methods face challenges in capturing comprehensive patient statuses, using extendable features, and incorporating traditional ICU scoring systems and deep learning methods.

INFORMATION SCIENCES (2023)

Add to Collection

Article Engineering, Electrical & Electronic

Multi-Frame Compressed Video Quality Enhancement by Spatio-Temporal Information Balance

Zeyang Wang, Mao Ye, Shuai Li, Xue Li

Summary: In recent years, the performance of multi-frame quality enhancement algorithms for compressed videos has been greatly improved compared with single-frame based algorithms. However, the existing methods mainly focus on mining the temporal information of multiple frames. To address this problem, we propose a plug-and-play module called Spatio-temporal Information Balance (STIB) to adaptively balance the spatial and temporal information. Experiments show that our module can significantly improve the performance of the existing multi-frame based enhancement algorithms.

IEEE SIGNAL PROCESSING LETTERS (2023)

Add to Collection

Article Computer Science, Information Systems

Spatio-Temporal Detail Information Retrieval for Compressed Video Quality Enhancement

Dengyan Luo, Mao Ye, Shuai Li, Ce Zhu, Xue Li

Summary: In the past few years, multi-frame quality enhancement has achieved great success in the field of compressed video. However, the recovery of detail information has not received enough attention. To address this, we propose a Spatio-Temporal Detail Retrieval (STDR) method which utilizes multi-path deformable alignment and residual dense blocks to improve the recovery of detail information in videos.

IEEE TRANSACTIONS ON MULTIMEDIA (2023)

Add to Collection

Article Computer Science, Artificial Intelligence

Disentanglement then reconstruction: Unsupervised domain adaptation by twice distribution alignments

Lihua Zhou, Mao Ye, Xinpeng Li, Ce Zhu, Yiguang Liu, Xue Li

Summary: Unsupervised domain adaptation transfers knowledge from labeled source domain to unlabeled target domain. We propose a disentanglement and reconstruction process to align the distributions twice. Experimental results confirm the effectiveness of our method.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Causality Discovery Based on Combined Causes and Multiple Causes in Drug-Drug Interaction

Sitthichoke Subpaiboonkit, Xue Li, Xin Zhao, Guido Zuccon

Summary: This article focuses on the problem of automatically detecting drug-drug interactions and proposes a novel approach to identify the specific causes and causal relationship direction. The method is validated on a real-world dataset for adverse effects and compared with current automated methods.

ADVANCED DATA MINING AND APPLICATIONS (ADMA 2022), PT I (2022)

Add to Collection

Proceedings Paper Acoustics

INTEGRATING DEPENDENCY TREE INTO SELF-ATTENTION FOR SENTENCE REPRESENTATION

Junhua Ma, Jiajun Li, Yuxuan Liu, Shangbo Zhou, Xue Li

Summary: Recent progress on parse tree encoder for sentence representation learning is remarkable. However, current works lack parallelization due to the recursive encoding of tree structures and fail to consider arc labels in dependency trees. To address these issues, we propose Dependency-Transformer, which incorporates a relation-attention mechanism to encode the dependency and spatial positional relations in sentence dependency trees. By a score-based method, our model successfully injects syntax information without affecting parallelizability and outperforms or aligns with state-of-the-art methods in sentence representation tasks while maintaining computational efficiency.

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

Add to Collection

Article Engineering, Electrical & Electronic

Self-Alignment for Black-Box Domain Adaptation of Image Classification

Chang Liu, Lihua Zhou, Mao Ye, Xue Li

Summary: This paper proposes a self-alignment approach to realize black-box domain adaptation, and improves the classification performance on high-confidence and low-confidence samples by matching data distributions and self-supervised learning.

IEEE SIGNAL PROCESSING LETTERS (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Hierarchical framework for interpretable and specialized deep reinforcement learning-based predictive maintenance

Ammar N. Abbas, Georgios C. Chasparis, John D. Kelleher

Summary: Deep reinforcement learning has significant potential in industrial decision-making, but its lack of interpretability poses challenges for safety-critical systems. This paper introduces a novel approach that combines probabilistic modeling and reinforcement learning, addressing these challenges and achieving excellent results in predictive maintenance for turbofan engines.

DATA & KNOWLEDGE ENGINEERING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

Global and item-by-item reasoning fusion-based multi-hop KGQA

Tongzhao Xu, Turdi Tohti, Askar Hamdulla

Summary: This paper proposes a multi-hop KGQA model that combines global and item-by-item reasoning fusion. It introduces a convolutional attention reasoning mechanism and serial prediction of relations to form reasoning paths, effectively addressing the issues of ignoring intermediate path reasoning and information interaction. The proposed model achieves significant accuracy improvement on three datasets.

DATA & KNOWLEDGE ENGINEERING (2024)

Add to Collection

Article Computer Science, Artificial Intelligence

CALEB: A Conditional Adversarial Learning Framework to enhance bot detection

Ilias Dimitriadis, George Dialektakis, Athena Vakali

Summary: The high growth of Online Social Networks (OSNs) has led to the emergence of social bots, which pose high-level security threats. This paper proposes an adaptive bot detection framework called CALEB based on CGAN and AC-GAN, which can simulate bot evolution and enhance detection performance. Experimental results show that the proposed approach outperforms previous methods in detecting new unseen bots.

DATA & KNOWLEDGE ENGINEERING (2024)

Add to Collection

© Peeref 2019-2024. All rights reserved.