Article
Automation & Control Systems
Hui Zeng, Xiaohui Cui
Summary: In this study, we propose a simple framework called SimCLRT for rumor tracking, which uses contrastive learning to alleviate the problem of tweet coverage. The experimental results show that SimCLRT has a good detection performance in different types of events, especially for events with a small number of tweets.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Bing Ma, Hai Zhuge
Summary: This paper proposes a method to represent texts using common words and measure the similarity of text classes. It also introduces a bottom-up text clustering approach to construct class trees. Experimental results show that this method outperforms other algorithms in terms of classification accuracy and class tree structure. Additionally, a document summarization approach based on this method achieves good performance.
EXPERT SYSTEMS WITH APPLICATIONS
(2024)
Article
Chemistry, Multidisciplinary
Ana Laura Lezama-Sanchez, Mireya Tovar Vidal, Jose A. Reyes-Ortiz
Summary: Topic discovery is the process of identifying main ideas in large volumes of text data. Current models often involve text preprocessing and can generate general topics. However, integrating automatic text classification before the topic discovery process can generate specific topics with coherent semantic relationships. This paper presents an approach that combines text classification and topic discovery on English textual data.
APPLIED SCIENCES-BASEL
(2023)
Article
Chemistry, Multidisciplinary
Adam Wawrzynski, Julian Szymanski
Summary: Various algorithms for text representation were studied, with statistical methods and neural networks compared. The performance of different approaches was evaluated on five datasets, revealing the strengths and weaknesses of each method.
APPLIED SCIENCES-BASEL
(2021)
Article
Computer Science, Artificial Intelligence
Dongliang Zhang, Mingchao Li, Dan Tian, Lingguang Song, Yang Shen
Summary: This research applies text mining to extract hidden information from unstructured quality records and improve the integration and classification of quality records through an enhanced CNN model and quantification using BERT and Word2vec methods. The proposed model achieves high precision with less manual intervention required.
ADVANCED ENGINEERING INFORMATICS
(2022)
Article
Biochemical Research Methods
Kazi Zainab, Gautam Sriyastava, Vijay Mago
Summary: This study presents a novel method of detecting the occupations of Twitter users engaged in the medical domain by combining word embedding with state-of-art neural networks. The results demonstrate that our approach outperforms traditional machine learning techniques in detecting medical occupations among users.
BMC BIOINFORMATICS
(2022)
Article
Computer Science, Artificial Intelligence
Zhongju Wang, Long Wang, Chao Huang, Shutong Sun, Xiong Luo
Summary: This paper proposes an automatic Chinese text categorization method using BERT model to extract features from emergency event reports. A novel loss function is introduced to address the data imbalance problem. The proposed method is validated on various datasets and compared with benchmark models, showing superior performance in accuracy, weighted average precision, recall, and F1 values. Hence, it holds promise for real applications in smart emergency management systems.
APPLIED INTELLIGENCE
(2023)
Article
Construction & Building Technology
Xixi Luo, Xinchun Li, Xuefeng Song, Quanlong Liu
Summary: This paper proposes a text self-classification model based on deep learning natural language processing (NLP) technology for automated classification of construction site accident cases by accident type. The model utilizes pretrained Word2Vec word embeddings and a convolutional neural network (CNN) model to achieve excellent feature extraction and learning abilities. This research provides a useful method for obtaining reliable accident prevention knowledge from textual descriptions.
JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT
(2023)
Article
Computer Science, Information Systems
Jingjing Gong, Hang Yan, Yining Zheng, Qipeng Guo, Xipeng Qiu, Xuanjing Huang
Summary: In this study, a new self-attention mechanism called eigen-centrality self-attention is proposed to incorporate higher-order relationships among words in text sequence encoding, leading to better results in multiple tasks compared to baseline models. The power method algorithm is adopted to compute the dominant eigenvector of the graph, and an iterative approach is derived to reduce memory consumption and computation requirement during the process.
SCIENCE CHINA-INFORMATION SCIENCES
(2021)
Article
Construction & Building Technology
Fahad ul Hassan, Tuyen Le
Summary: This study developed a machine learning model for classifying DB requirements into three predefined categories. By comparing various training methods, the best model trained on a large dataset achieved an impressive accuracy of 93.20%. The research is expected to reduce the time and effort required for extracting subcontractor scopes, and minimize the possibility of errors.
AUTOMATION IN CONSTRUCTION
(2021)
Article
Automation & Control Systems
Md. Rajib Hossain, Mohammed Moshiul Hoque, Nazmul Siddique
Summary: This paper proposes an intelligent text classification framework called AVG-M+CNN for a resource-constrained language like Bengali. The framework includes an average meta-embedding feature fusion module and a convolutions neural network module. It also introduces an automatic hyperparameter tuning and selection algorithm to enhance the performance. The proposed models are evaluated using intrinsic and extrinsic evaluators, and the AVG-M+CNN model achieves high accuracy rates on multiple Bengali corpora.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2023)
Article
Computer Science, Information Systems
Mohammad Bani-Almarjeh, Mohamad-Bassam Kurdy
Summary: Recently, research has shown that Transformer model architecture and pre-trained Transformer-based language models perform well in natural language understanding and text generation. However, there is limited research in using these models for text generation in Arabic. This study aims to evaluate and compare different model architectures and pre-trained language models for Arabic abstractive summarization. Results show that Transformer-based models significantly outperform traditional RNN-based models and using less data. AraT5, a encoder-decoder pre-trained Transformer, is found to be more suitable for summarizing Arabic text compared to the AraBERT-initialized BERT2BERT model. Additionally, both AraT5 and AraGPT2 perform better than AraBERT in summarizing out-of-domain text.
INFORMATION PROCESSING & MANAGEMENT
(2023)
Article
Computer Science, Information Systems
M. L. Tlachac, Avantika Shrestha, Mahum Shah, Benjamin Litterer, Elke A. A. Rundensteiner
Summary: Given the prevalence of depression, it is important to develop effective and unobtrusive diagnosis tools. Recent research on depression screening using text messages has been limited by the formal nature of lexical category features. To address this limitation, we propose a strategy to automatically construct alternative lexicons containing more colloquial terms. Through machine learning models, we compare the screening capabilities of these lexicons and confirm that less formal lexicons can improve the performance of classification models for depression screening using text messages.
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS
(2023)
Article
Computer Science, Artificial Intelligence
Ruidong Zhang, Zelin Guo, Hai Huan
Summary: This article introduces a new text classification method using positional and edge graph convolutional networks. By adding positional encoding input representation and extracting multi-dimensional edge features, this method solves the problem of insufficient utilization of position information and edge features in existing methods, and achieves good classification results on multiple datasets.
Article
Computer Science, Artificial Intelligence
Minqian Liu, Lizhao Liu, Junyi Cao, Qing Du
Summary: Most existing methods for text classification focus on extracting discriminative text representation, but are computationally inefficient. To improve efficiency, label embedding frameworks are proposed. This paper further utilizes label information by constructing text-attended label representation. Experimental results show competitive performance on multiple classification benchmarks.