☆ 4.5 Article

Using phrases as features in email classification

JOURNAL OF SYSTEMS AND SOFTWARE (2009)

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Volume 82, Issue 6, Pages 1036-1045

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.jss.2009.01.013

Keywords

Document classification; Email; Resemblance; Nearest-neighbour; Naive Bayes

Funding

Council of the Hong Kong SAR, China [1198/03E]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

In this paper, we report our experience on the use of phrases as basic features in the email classification problem. We performed extensive empirical evaluation using our large email collections and tested with three text classification algorithms, namely, a naive Bayes classifier and two k-NN classifiers using TF-IDF weighting and resemblance respectively. The investigation includes studies on the effect of phrase size, the size of local and global sampling, the neighbourhood size, and various methods to improve the classification accuracy. We determined suitable settings for various parameters of the classifiers and performed a comparison among the classifiers with their best settings. Our result shows that no classifier dominates the others in terms of classification accuracy. Also, we made a number of observations on the special characteristics of emails. In particular, we observed that public emails are easier to classify than private ones. (c) 2009 Elsevier Inc. All rights reserved.

Using phrases as features in email classification

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Using phrases as features in email classification

Journal

JOURNAL OF SYSTEMS AND SOFTWARE

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper