4.5 Article

Online active multi-field learning for efficient email spam filtering

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 33, 期 1, 页码 117-136

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-011-0461-x

关键词

Online learning; Multi-field learning; Active learning; Email spam filtering; TREC spam track

资金

  1. National Natural Science Foundation of China [60873097, 60933005]
  2. Program for New Century Excellent Talents in University [NCET-06-0926]
  3. Fund of Innovation of NUDT [B080605]

向作者/读者索取更多资源

Email spam causes a serious waste of time and resources. This paper addresses the email spam filtering problem and proposes an online active multi-field learning approach, which is based on the following ideas: (1) Email spam filtering is an online application, which suggests an online learning idea; (2) Email document has a multi-field text structure, which suggests a multi-field learning idea; and (3) It is costly to obtain a label for a real-world email spam filter, which suggests an active learning idea. The online learner regards the email spam filtering as an incremental supervised binary streaming text classification. The multi-field learner combines multiple results predicted by field classifiers in a novel compound weight schema, and each field classifier calculates the arithmetical average of multiple conditional probabilities calculated from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and takes the more uncertain email as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance with greatly reduced label requirements and very low space-time costs. The performance of our online active multi-field learning, the standard (1-ROCA)% measurement, even exceeds the full feedback performance of some advanced individual text classification algorithms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据