4.7 Article

An integrated retrieval framework for similar questions: Word-semantic embedded label clustering - LDA with question life cycle

Journal

INFORMATION SCIENCES
Volume 537, Issue -, Pages 227-245

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.ins.2020.05.014

Keywords

CQA; Question retrieval; Product life cycle; Semantic representation

Funding

  1. State Key Program of National Nature Science Foundation of China [61936001]

Ask authors/readers for more resources

Question retrieval is an extremely important research field in Community Question Answering (CQA). Most existing question retrieval methods depend on semantic analysis of questions, whose effectiveness suffers from the short texts of the noise words in the question corpus. In order to recommend the questions with more advanced knowledge to users, the influence of the questions' popularity should be considered during retrieving questions. To make retrieved questions with both similar semantics and high popularity, we propose an Integrated Retrieval Framework for Similar Questions named Word-semantic Embedded Label Clustering - LDA with Question Life Cycle (WELQLC-QR), consisting of Word-semantic Embedded Label Clustering - LDA (WEL) and Question Life Cycle Optimization Similar Question List Strategy (QLC). Firstly, WEL is proposed for question retrieval from the perspective of semantic matching. It not only overcomes the problem of over-generalization of the semantic information extracted by topic models when facing short questions with multi-levels labels, but also avoids the influence of noise vocabularies during semantic extracting of the questions. Then, based on the internal factors (i.e., the number of comments and answers to the question) and external factors (i.e., programming language ranking information) of questions, QLC constructs a popularity-predicted model to optimize the similar question set searched by WEL, making the final retrieval results both semantically similar and popular. Finally, experiments are conducted on CQADupStack dataset, and results show that the MRR@N of WELQLC-QR model has an average increase of 8.99%, 8.3%, 4.74% and 3.56% compared with that of L-LDA, LC-LDA, BM25 and Word2vec, respectively. (C) 2020 Elsevier Inc. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available