4.7 Article

An integrated framework of learning and evidential reasoning for user profiling using short texts

期刊

INFORMATION FUSION
卷 70, 期 -, 页码 27-42

出版社

ELSEVIER
DOI: 10.1016/j.inffus.2020.12.004

关键词

User profiling; Short texts; Mass functions; Information fusion; Dempster-Shafer theory

资金

  1. US Office of Naval Research Global [N62909-19-1-2031]

向作者/读者索取更多资源

This study proposes an integrated framework based on Dempster-Shafer theory of evidence, word embedding, and kappa-means clustering for inferring user profiles, which can handle short texts and uncertainty in user texts. The framework consists of three phases: learning abstract concepts, evidential inference and combination, and user profile extraction. It is flexible in processing data from multiple modes and sources, and the resulting profiles are interpretable and visualizable in practical applications. Experimental studies on datasets from Twitter and Facebook validate the effectiveness of the proposed framework.
Inferring user profiles based on texts created by users on social networks has a variety of applications in recommender systems such as job offering, item recommendation, and targeted advertisement. The problem becomes more challenging when working with short texts like tweets on Twitter, or posts on Facebook. This work aims at proposing an integrated framework based on Dempster-Shafer theory of evidence, word embedding, and kappa-means clustering for user profiling problem, which is capable of not only working well with short texts but also dealing with uncertainty inherently in user texts. The proposed framework is essentially composed of three phases: (1) Learning abstract concepts at multiple levels of abstraction from user corpora; (2) Evidential inference and combination for user modeling; and (3) User profile extraction. Particularly, in the first phase, a word embedding technique is used to convert preprocessed texts into vectors which capture semantics of words in user corpus, and then kappa-means clustering is utilized for learning abstract concepts at multiple levels of abstraction, each of which reflects appropriate semantics of user profiles. In the second phase, by considering each document in user corpus as an evidential source that carries some partial information for inferring user profiles, we first infer a mass function associated with each user document by maximum a posterior estimation, and then apply Dempster's rule of combination for fusing all documents' mass functions into an overall one for the user corpus. Finally, in the third phase, we apply the so-called pignistic probability principle to extract top-n keywords from user's overall mass function to define the user profile. Thanks to the ability of combining pieces of information from many documents, the proposed framework is flexible enough to be scaled when input data coming from not only multiple modes but different sources on web environments. Besides, the resulting profiles are interpretable, visualizable, and compatible in practical applications. The effectiveness of the proposed framework is validated by experimental studies conducted on datasets crawled from Twitter and Facebook.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据