☆ 4.7 Article

Set-CNN: A text convolutional neural network based on semantic extension for short text classification

KNOWLEDGE-BASED SYSTEMS (2022)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 257, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2022.109948

关键词

Atrous convolution; Multichannel text-CNN; Semantic extension; Short text classification

类别

Computer Science, Artificial Intelligence

资金

National Key Research and Development Program of China [2018YFC0830801]
111 Project [B21049]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a semantic extension-based classification algorithm for short texts called Set-CNN is proposed. Set-CNN enriches the features of short texts using a semantic extension mechanism and captures semantic features at different levels through a multiple-channel convolutional framework. Experimental results show that Set-CNN outperforms state-of-the-art alternatives and exhibits excellent performance as a lightweight text classifier.

A semantic extension-based classification algorithm for short texts, i.e., Set-CNN, is proposed in this paper. The proposed Set-CNN features three aspects. First, a semantic extension mechanism based on the fast clustering algorithm is applied to enrich the features of short texts. Second, a multiple-channel convolutional framework is proposed to capture semantic features at different levels. More specifically, both ordinary 1D convolution and atrous convolution are performed on the original texts to capture local and global semantic information. Ordinary 1D convolution convolves words one by one to capture original semantic information at the word level, which can be considered local semantic information. Atrous convolution convolves an entire short text to capture the context-level information of the original text, i.e., the global semantic information. This information can offset the noise incurred by semantic extension. The convolution channel equipped with an evolved GLU takes extended short texts as the object of convolution to capture semantic information at the extended context level. In addition, it functions to mitigate vanishing of the gradient. Third, we design a multiple-channel version of Text-CNN to generate different feature maps, which capture semantic features on different scales, and provide useful information to improve the classification performance of short texts. Finally, the performance of Set-CNN is assessed extensively over 4 datasets, namely, Subj, TREC, SST-2 and the Sogou corpus. The experimental results show that Set-CNN is more effective than state-of-the-art alternatives, including CNN-VE, multichannel CNN, BERT, etc. In particular, Set-CNN exhibits excellent performance as a lightweight text classifier, with lower computational complexity than BERTbase. (c) 2022 Elsevier B.V. All rights reserved.

Set-CNN: A text convolutional neural network based on semantic extension for short text classification

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Set-CNN: A text convolutional neural network based on semantic extension for short text classification

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文