4.5 Article

Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers

Journal

ACM TRANSACTIONS ON INFORMATION SYSTEMS
Volume 30, Issue 4, Pages -

Publisher

ASSOC COMPUTING MACHINERY
DOI: 10.1145/2382438.2382443

Keywords

Algorithms; Experimentation; Design; Performance; Web crawlers; focused crawlers; sentiment analysis; opinion mining; classification; graph similarities; random walk path

Funding

  1. National Natural Science Foundation of China [60921061, 70890084, 71025001, 91024030, 90924302]
  2. Chinese Academy of Sciences [2F07C01]
  3. Chinese Ministry of Health [2012ZX10004801]
  4. DOD Defense Threat Reduction Agency [HDTRA-09-0058]
  5. NSF [CNS-0709338, CBET-0730908, IIS-1236970]
  6. Direct For Computer & Info Scie & Enginr
  7. Div Of Information & Intelligent Systems [1236970] Funding Source: National Science Foundation

Ask authors/readers for more resources

Despite the increased prevalence of sentiment-related information on the Web, there has been limited work on focused crawlers capable of effectively collecting not only topic-relevant but also sentiment-relevant content. In this article, we propose a novel focused crawler that incorporates topic and sentiment information as well as a graph-based tunneling mechanism for enhanced collection of opinion-rich Web content regarding a particular topic. The graph-based sentiment (GBS) crawler uses a text classifier that employs both topic and sentiment categorization modules to assess the relevance of candidate pages. This information is also used to label nodes in web graphs that are employed by the tunneling mechanism to improve collection recall. Experimental results on two test beds revealed that GBS was able to provide better precision and recall than seven comparison crawlers. Moreover, GBS was able to collect a large proportion of the relevant content after traversing far fewer pages than comparison methods. GBS outperformed comparison methods on various categories of Web pages in the test beds, including collection of blogs, Web forums, and social networking Web site content. Further analysis revealed that both the sentiment classification module and graph-based tunneling mechanism played an integral role in the overall effectiveness of the GBS crawler.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available