4.7 Article

Intelligent classification of web pages using contextual and visual features

Journal

APPLIED SOFT COMPUTING
Volume 11, Issue 2, Pages 1638-1647

Publisher

ELSEVIER
DOI: 10.1016/j.asoc.2010.05.003

Keywords

Web-pages classification; Content based filtering; Porn image detection; Skin color detection; Adult image detection

Funding

  1. Iranian Telecommunication Research Center (ITRC)

Ask authors/readers for more resources

In this paper we address classification of Web content and in particular its application in the detection of pornographic Web pages. Filtering of undesirable Web content is mainly achieved based on blocking a specific Web address via searching it in a reference list of black URLs or doing a plain contextual analysis on the page by searching special keywords in the text. The main problem with current filtering methods is the requirement for instantly update of the URL list and also the high rate of over-blocking the usual pages. In this paper, we propose an intelligent approach which is based on using textual, profile, and visual features in a hierarchical structure classifier. Textual features contain information about keywords, black-words, etc. and profile features contain structural information like number of links, meta-tags, pictures, etc. As for the visual features we employ a sort of global and local indicative features including topological and shape-based characteristics which are extracted from the skin region. The algorithm was applied on a dataset with 1295 Web pages as training set including 700 porn pages (coming with text, image, or both) in English and Persian, and 595 non-porn pages including pages with medical, health, sports, etc. topics. Using a test dataset with 290 Web-ages a 95% accuracy rate was obtained. (C) 2010 Elsevier B. V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available