4.7 Article

Set-CNN: A text convolutional neural network based on semantic extension for short text classification

期刊

KNOWLEDGE-BASED SYSTEMS
卷 257, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.knosys.2022.109948

关键词

Atrous convolution; Multichannel text-CNN; Semantic extension; Short text classification

资金

  1. National Key Research and Development Program of China [2018YFC0830801]
  2. 111 Project [B21049]

向作者/读者索取更多资源

In this paper, a semantic extension-based classification algorithm for short texts called Set-CNN is proposed. Set-CNN enriches the features of short texts using a semantic extension mechanism and captures semantic features at different levels through a multiple-channel convolutional framework. Experimental results show that Set-CNN outperforms state-of-the-art alternatives and exhibits excellent performance as a lightweight text classifier.
A semantic extension-based classification algorithm for short texts, i.e., Set-CNN, is proposed in this paper. The proposed Set-CNN features three aspects. First, a semantic extension mechanism based on the fast clustering algorithm is applied to enrich the features of short texts. Second, a multiple-channel convolutional framework is proposed to capture semantic features at different levels. More specifically, both ordinary 1D convolution and atrous convolution are performed on the original texts to capture local and global semantic information. Ordinary 1D convolution convolves words one by one to capture original semantic information at the word level, which can be considered local semantic information. Atrous convolution convolves an entire short text to capture the context-level information of the original text, i.e., the global semantic information. This information can offset the noise incurred by semantic extension. The convolution channel equipped with an evolved GLU takes extended short texts as the object of convolution to capture semantic information at the extended context level. In addition, it functions to mitigate vanishing of the gradient. Third, we design a multiple-channel version of Text-CNN to generate different feature maps, which capture semantic features on different scales, and provide useful information to improve the classification performance of short texts. Finally, the performance of Set-CNN is assessed extensively over 4 datasets, namely, Subj, TREC, SST-2 and the Sogou corpus. The experimental results show that Set-CNN is more effective than state-of-the-art alternatives, including CNN-VE, multichannel CNN, BERT, etc. In particular, Set-CNN exhibits excellent performance as a lightweight text classifier, with lower computational complexity than BERTbase. (c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据