Article
Mathematical & Computational Biology
Oezge Kart, Alexandre Mestiashvili, Kurt Lachmann, Richard Kwasnicki, Michael Schroeder
Summary: This study develops a web-based article recommender service called Emati using a content-based approach and supervised machine learning models. Two different approaches, including TF-IDF with naive Bayes model and fine-tuning the BERT language model, are implemented. Emati provides updated article recommendations to users and also offers personalized search functionality.
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION
(2022)
Article
Computer Science, Information Systems
Mazhar Ali Dootio, Asim Imdad Wagan
Summary: Developing and analyzing Sindhi text corpora is challenging due to the lack of resources, but text resources have been developed for computational linguists and researchers.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2021)
Article
Chemistry, Analytical
Beakcheol Jang, Myeonghwi Kim, Inhwan Kim, Jong Wook Kim
Summary: Diseases are spreading rapidly due to globalization, and the internet data can be leveraged to provide accurate and timely disease information. This study develops an infectious disease surveillance system using deep learning algorithm and various visualization techniques to present disease data.
Article
Computer Science, Information Systems
Barbara Cardone, Ferdinando Di Martino, Sabrina Senatore
Summary: This paper proposes an alternative approach for document classification by leveraging the distribution of data in multi-dimensional feature space to assess feature importance. It builds a relevance measure and expresses the values in natural language using fuzzy variables and linguistic labels for human comprehension.
INFORMATION SCIENCES
(2022)
Article
Computer Science, Artificial Intelligence
Qifeng Wan, Xuanhua Xu, Jing Han
Summary: In this study, we propose an innovative approach for dimensionality reduction in large-scale group decision-making scenarios that targets linguistic preferences. The method combines TF-IDF feature similarity and information loss entropy to address challenges in decision-making with a large number of decision makers.
APPLIED SOFT COMPUTING
(2024)
Article
Computer Science, Information Systems
Nur Aqilah Paskhal Rostam, Nurul Hashimah Ahamed Hassain Malim
Summary: The Quran and Al-Hadith complement each other in interpreting Islamic teachings. This research proposes a method using text categorisation to classify selected categories and found that Support Vector Machine (SVM) achieved better accuracy in addressing the interrelationship for single- and multi-label classifications.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
(2021)
Article
Biochemical Research Methods
Yongxian Fan, Wanru Wang
Summary: The research team conducted studies on the origins of DNA replication in seven eukaryotic species, proposed unique prediction and feature extraction methods, and demonstrated superior performance in experiments. After multiple cross validations, the prediction accuracy reached over 90% for all species, showing that the models of different species could predict each other with high accuracy and share common motifs.
BMC BIOINFORMATICS
(2021)
Article
Computer Science, Artificial Intelligence
Jingjing Xu
Summary: This article builds a sentiment analysis model based on natural language processing technology and proposes an improved TF-IDF algorithm. The emotional value of each emotional word is determined using a weighted average method. Inspirational words are used to obtain the emotional tendency and emotional value of the English corpus. The results show that the model has high classification accuracy and operation efficiency when selecting feature words. The improved TF-IDF algorithm significantly improves the efficiency of college English learning by adding necessary information weight processing and word density weight processing.
PEERJ COMPUTER SCIENCE
(2023)
Article
Computer Science, Information Systems
Harald Vranken, Hassan Alizadeh
Summary: This paper addresses the detection of domain names generated by domain name generation algorithms (DGAs) using machine learning and deep learning. The authors propose the use of TF-IDF to measure the frequencies of relevant n-grams in domain names and utilize them as features in learning algorithms. Experimental results show that a deep MLP model achieves the best performance, with an AUC of 0.995 and an average F1-score of 0.891.
Article
Chemistry, Multidisciplinary
Min-Young Seo, Se-Yun Hwang, Jang-Hyun Lee, Jae-Gon Kim, Hong-Bae Jun
Summary: There are two types of maintenance policies for equipment: breakdown maintenance and preventive maintenance. With the development of ICT and IoT technology, the use of Condition-Based Maintenance (CBM) to diagnose equipment conditions is increasing. This study introduces an approach to diagnose equipment conditions by extracting specific data features related to equipment failures, and provides experimental validation on a centrifugal pump.
APPLIED SCIENCES-BASEL
(2022)
Article
Construction & Building Technology
Yipeng Liu, Junwu Wang, Shanrong Tang, Jiaji Zhang, Jinyingjun Wan
Summary: Construction accident investigation reports are difficult to analyze due to the voluminous Chinese text. To overcome this problem, a novel approach combining text mining techniques and LDA models is proposed to identify the key factors leading to safety accidents in the Chinese construction industry.
Article
Environmental Sciences
Kun Wang, Kai Wu, Chenlong Wang, Yali Tong, Jiajia Gao, Penglai Zuo, Xiaoxi Zhang, Tao Yue
Summary: Satellite-based measures of NO2 have enabled more detailed features and hotspot identification, while a proposed method using TF-IDF has successfully identified major source types in Central and Eastern China based on oversampled TROPOMI NO2 column data. Identifying hotspot grids can indicate a higher probability of local high-intensity NOx pollution, with key source types distinguished through semantic analysis.
SCIENCE OF THE TOTAL ENVIRONMENT
(2022)
Article
Computer Science, Theory & Methods
Ran Huang
Summary: This paper addresses the deficiency of traditional content-based recommendation technology in semantic analysis and proposes an improved recommendation algorithm that integrates semantic information with the TF-IDF vector space model. Experimental results demonstrate the effectiveness and stability of the proposed method.
JOURNAL OF BIG DATA
(2023)
Article
Computer Science, Information Systems
Wenjuan Bu, Hui Shu, Fei Kang, Qian Hu, Yuntian Zhao
Summary: This study proposes a customized BERTopic model to achieve automatic tagging and updating of application software based on topic clustering and subject word extraction. Additionally, a data enhancement method based on the c-TF-IDF algorithm is introduced to address the issue of imbalanced datasets. Experimental results demonstrate that the proposed method achieves satisfactory performance in terms of accuracy, recall rate, and F1 value.
Article
Computer Science, Information Systems
Maibam Debina Devi, Navanath Saharia
Summary: This experiment uses statistical and semantic features to cluster Tweets as representative of social media/user-generated content. A combination of tf-idf and synonym-based weighting scheme is employed, adding semantic importance to the clusters.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)