☆ 4.5 Article

Online active multi-field learning for efficient email spam filtering

KNOWLEDGE AND INFORMATION SYSTEMS (2012)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 33, 期 1, 页码 117-136

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-011-0461-x

关键词

Online learning; Multi-field learning; Active learning; Email spam filtering; TREC spam track

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

资金

National Natural Science Foundation of China [60873097, 60933005]
Program for New Century Excellent Talents in University [NCET-06-0926]
Fund of Innovation of NUDT [B080605]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Email spam causes a serious waste of time and resources. This paper addresses the email spam filtering problem and proposes an online active multi-field learning approach, which is based on the following ideas: (1) Email spam filtering is an online application, which suggests an online learning idea; (2) Email document has a multi-field text structure, which suggests a multi-field learning idea; and (3) It is costly to obtain a label for a real-world email spam filter, which suggests an active learning idea. The online learner regards the email spam filtering as an incremental supervised binary streaming text classification. The multi-field learner combines multiple results predicted by field classifiers in a novel compound weight schema, and each field classifier calculates the arithmetical average of multiple conditional probabilities calculated from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and takes the more uncertain email as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance with greatly reduced label requirements and very low space-time costs. The performance of our online active multi-field learning, the standard (1-ROCA)% measurement, even exceeds the full feedback performance of some advanced individual text classification algorithms.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.5

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

A Collaborative Abstraction Based Email Spam Filtering with Fingerprints

P. Rajendran, A. Tamilarasi, R. Mynavathi

Summary: This paper proposes a hybrid approach for collaborative spam detection, which abstracts the entire email layout and extracts layout fingerprints to effectively match and catch the sprouting nature of spam. The system creates a spam database using recommendations from other users, calculates cumulative weights to reduce false positive and false negative ratio, and progressively updates the fingerprints of newly classified spam for up-to-date spam detection. The system is evaluated with the Spam Assassin dataset and shows comparatively better performance.

WIRELESS PERSONAL COMMUNICATIONS (2022)