☆ 4.7 Article

An integrated retrieval framework for similar questions: Word-semantic embedded label clustering - LDA with question life cycle

INFORMATION SCIENCES (2020)

Journal

INFORMATION SCIENCES

Volume 537, Issue -, Pages 227-245

Publisher

ELSEVIER SCIENCE INC

DOI: 10.1016/j.ins.2020.05.014

Keywords

CQA; Question retrieval; Product life cycle; Semantic representation

Funding

State Key Program of National Nature Science Foundation of China [61936001]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Question retrieval is an extremely important research field in Community Question Answering (CQA). Most existing question retrieval methods depend on semantic analysis of questions, whose effectiveness suffers from the short texts of the noise words in the question corpus. In order to recommend the questions with more advanced knowledge to users, the influence of the questions' popularity should be considered during retrieving questions. To make retrieved questions with both similar semantics and high popularity, we propose an Integrated Retrieval Framework for Similar Questions named Word-semantic Embedded Label Clustering - LDA with Question Life Cycle (WELQLC-QR), consisting of Word-semantic Embedded Label Clustering - LDA (WEL) and Question Life Cycle Optimization Similar Question List Strategy (QLC). Firstly, WEL is proposed for question retrieval from the perspective of semantic matching. It not only overcomes the problem of over-generalization of the semantic information extracted by topic models when facing short questions with multi-levels labels, but also avoids the influence of noise vocabularies during semantic extracting of the questions. Then, based on the internal factors (i.e., the number of comments and answers to the question) and external factors (i.e., programming language ranking information) of questions, QLC constructs a popularity-predicted model to optimize the similar question set searched by WEL, making the final retrieval results both semantically similar and popular. Finally, experiments are conducted on CQADupStack dataset, and results show that the MRR@N of WELQLC-QR model has an average increase of 8.99%, 8.3%, 4.74% and 3.56% compared with that of L-LDA, LC-LDA, BM25 and Word2vec, respectively. (C) 2020 Elsevier Inc. All rights reserved.

An integrated retrieval framework for similar questions: Word-semantic embedded label clustering - LDA with question life cycle

Journal

INFORMATION SCIENCES

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An integrated retrieval framework for similar questions: Word-semantic embedded label clustering - LDA with question life cycle

Journal

INFORMATION SCIENCES

Publisher

ELSEVIER SCIENCE INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper