4.7 Article

Syntactic N-grams as machine learning features for natural language processing

Journal

EXPERT SYSTEMS WITH APPLICATIONS
Volume 41, Issue 3, Pages 853-860

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2013.08.015

Keywords

Syntactic n-grams; sn-Grams; Parsing; Classification features; Syntactic paths; Authorship attribution; SVM; NB; J48

Funding

  1. Mexican government [CONACYT 50206-H, 83270]
  2. Institut Politecnico Nacional, Mexico [SIP 20111146, 20113295, 20120418]
  3. Mexico City government (ICYT-DF project) [PICCO10-120]
  4. FP7-PEOPLE-2010-IRSES: Web Information Quality - Evaluation Initiative (WIQ-EI) European Commission [269180]

Ask authors/readers for more resources

In this paper we introduce and discuss a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner how we construct them, i.e., what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking words as they appear in a text, i.e., sn-grams are constructed by following paths in syntactic trees. In this manner, sn-grams allow bringing syntactic knowledge into machine learning methods; still, previous parsing is necessary for their construction. Sn-grams can be applied in any natural language processing (NLP) task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. We used as baseline traditional n-grams of words, part of speech (PUS) tags and characters; three classifiers were applied: support vector machines (SVM), naive Bayes (NB), and tree classifier J48. Sn-grams give better results with SVM classifier. (C) 2013 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Improving aspect-level sentiment analysis with aspect extraction

Navonil Majumder, Rishabh Bhardwaj, Soujanya Poria, Alexander Gelbukh, Amir Hussain

Summary: Aspect-based sentiment analysis consists of aspect extraction and labelling aspect sentiment polarity. This study shows that transferring knowledge from a pre-trained model can improve the performance of sentiment analysis models and the improvement can be applied across different domains.

NEURAL COMPUTING & APPLICATIONS (2022)

Editorial Material Computer Science, Hardware & Architecture

Sensors and wearable-based intelligent systems (VSI-swis)

Imran Sarwar Bajwa, Patrick Seeling, Alexandar Gelbukh

COMPUTERS & ELECTRICAL ENGINEERING (2021)

Article Computer Science, Artificial Intelligence

Multi-label emotion classification of Urdu tweets

Noman Ashraf, Lal Khan, Sabur Butt, Hsien-Tsung Chang, Grigori Sidorov, Alexander Gelbukh

Summary: This study created the first multi-label emotion dataset in Urdu and adopted a multi-label classification approach for emotion detection. Due to the morphological and syntactic structure of Urdu, emotion detection posed a challenging problem. The study experimented with various baseline classifiers and different text representation methods, and presented the best results obtained.

PEERJ COMPUTER SCIENCE (2022)

Article Computer Science, Artificial Intelligence

Reaching for upper bound ROUGE score of extractive summarization methods

Iskander Akhmetov, Rustam Mussabayev, Alexander Gelbukh

Summary: This article introduces the extractive text summarization (ETS) method and compares different approaches based on their ROUGE-1 scores. The experimental results show that the genetic algorithm initialized by the Greedy algorithm results achieves the best performance, and the scores are higher than the current state-of-the-art text summarization models.

PEERJ COMPUTER SCIENCE (2022)

Article Chemistry, Multidisciplinary

Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data

Atnafu Lambebo Tonja, Olga Kolesnikova, Alexander Gelbukh, Grigori Sidorov

Summary: This paper discusses the feasibility of using source-side monolingual dataset of low-resource languages to improve the NMT system. Experiments show that both self-learning and fine-tuning approaches can enhance the translation quality for low-resource Wolaytta-English translation.

APPLIED SCIENCES-BASEL (2023)

Article Computer Science, Artificial Intelligence

Multi-label emotion classification in texts using transfer learning

Iqra Ameer, Necva Bolucu, Muhammad Hammad Fahim Siddiqui, Burcu Can, Grigori Sidorov, Alexander Gelbukh

Summary: Social media is a valuable platform for understanding people's emotions. The use of multiple attention mechanisms and Transformer networks has been shown to improve accuracy in emotion classification.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Chemistry, Multidisciplinary

Regret and Hope on Transformers: An Analysis of Transformers on Regret and Hope Speech Detection Datasets

Grigori Sidorov, Fazlourrahman Balouchzahi, Sabur Butt, Alexander Gelbukh

Summary: In this paper, the performance of different transformer models for regret and hope speech detection on two novel datasets was analyzed. The transformer models were found to outperform previous approaches in regret detection, with the roberta-based model achieving the highest macro F1-score of 0.83. For hope speech detection, the bert-based, uncased model achieved the highest macro F1-score of 0.72. However, the performance of each model varied slightly depending on the task and dataset. These findings emphasize the effectiveness of transformer models for hope speech and regret detection tasks, and the importance of considering context, specific transformer architectures, and pre-training on their performance.

APPLIED SCIENCES-BASEL (2023)

Article Computer Science, Artificial Intelligence

PolyHope: Two-level hope speech detection from tweets

Fazlourrahman Balouchzahi, Grigori Sidorov, Alexander Gelbukh

Summary: Hope is a significant factor that influences human's state of mind, emotions, behaviors, and decisions. However, it has rarely been studied as a social media analysis task. This paper introduces a hope speech dataset that categorizes tweets into different types of hope and discusses the challenges of classifying hope. Several baseline models based on different learning approaches are evaluated using the dataset, and the results show that contextual embedding models outperform simple machine learning classifiers.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Computer Science, Artificial Intelligence

ReDDIT: Regret detection and domain identification from text

Fazlourrahman Balouchzahi, Sabur Butt, Grigori Sidorov, Alexander Gelbukh

Summary: Regret is a common emotion that arises from sadness, disappointment, or remorse about past events or actions. This paper presents a study of regret and its expression on social media, using a novel dataset of Reddit texts classified into three categories. The study finds that Reddit users most frequently express regret for past actions, particularly in the domain of relationships.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

Article Computer Science, Information Systems

Exemplars-Guided Empathetic Response Generation Controlled by the Elements of Human Communication

Navonil Majumder, Deepanway Ghosal, Devamanyu Hazarika, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria

Summary: Empathy is crucial for human interactions and plays a role in the cohesion of societies. This paper proposes an approach that uses exemplars and synthetic labels to generate empathetic responses, leading to significant improvements in response quality.

IEEE ACCESS (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, Style Change Detection, and Trigger Detection Extended Abstract

Janek Bevendorff, Berta Chulvi, Elisabetta Fersini, Annina Heini, Mike Kestemont, Krzysztof Kredens, Maximilian Mayerl, Reyner Ortega-Bueno, Piotr Pezik, Martin Potthast, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein, Matti Wiegmann, Magdalena Wolska, Eva Zangerle

Summary: The paper provides a concise overview of the four shared tasks to be organized at the PAN 2022 lab on digital text forensics and stylometry during the CLEF 2022 conference. These tasks aim to advance the technology development in text forensics and stylometry and ensure objective evaluation on newly developed benchmark datasets.

ADVANCES IN INFORMATION RETRIEVAL, PT II (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Virality Prediction for News Tweets Using RoBERTa

Christian E. Maldonado-Sifuentes, Jason Angel, Grigori Sidorov, Olga Kolesnikova, Alexander Gelbukh

Summary: The virality of a tweet is crucial for news outlets to transition to online formats and attract new audiences from social media platforms like Twitter, with effective tweet writing playing a key role in maximizing impact. The proposed method utilizing RoBERTa for tweet classification shows significant improvement in predicting influential tweets compared to traditional approaches.

ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Plagiarism Detection in Students' Answers Using FP-Growth Algorithm

Sabina Nurlybayeva, Iskander Akhmetov, Alexander Gelbukh, Rustam Mussabayev

Summary: In recent years, the quality of education has declined, and the percentage of academic plagiarism has increased. The new plagiarism detection algorithm classifies student assignments based on the degree of plagiarism, using natural language processing methods.

ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Determining the Relationship Between the Letters in the Voynich Manuscript Splitting the Text into Parts

Esbolat Sapargali, Iskander Akhmetov, Alexandr Pak, Alexander Gelbukh

Summary: The Voynich Manuscript is an illustrated manuscript code that has not yet been defined in terms of its writing structure and relationship to other languages. The study explores the effectiveness of examining point detail versus examining the full picture all at once in a single study. It suggests that a narrowly directed systematic approach may help unravel the text of the manuscript in a progressive manner.

ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II (2021)

Article Computer Science, Information Systems

Greedy Optimization Method for Extractive Summarization of Scientific Articles

Iskander Akhmetov, Alexander Gelbukh, Rustam Mussabayev

Summary: This study introduces a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. By selecting sentences with high TFIDF values and tuning the minimum document frequency parameter, our method achieves competitive ROUGE scores compared to state-of-the-art models.

IEEE ACCESS (2021)

Review Computer Science, Artificial Intelligence

A comprehensive review of slope stability analysis based on artificial intelligence methods

Wei Gao, Shuangshuang Ge

Summary: This study provides a comprehensive review of slope stability research based on artificial intelligence methods, focusing on slope stability computation and evaluation. The review covers studies using quasi-physical intelligence methods, simulated evolutionary methods, swarm intelligence methods, hybrid intelligence methods, artificial neural network methods, vector machine methods, and other intelligence methods. The merits, demerits, and state-of-the-art research advancement of these studies are analyzed, and possible research directions for slope stability investigation based on artificial intelligence methods are suggested.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Machine learning approaches for lateral strength estimation in squat shear walls: A comparative study and practical implications

Khuong Le Nguyen, Hoa Thi Trinh, Saeed Banihashemi, Thong M. Pham

Summary: This study investigated the influence of input parameters on the shear strength of RC squat walls and found that ensemble learning models, particularly XGBoost, can effectively predict the shear strength. The axial load had a greater influence than reinforcement ratio, and longitudinal reinforcement had a more significant impact compared to horizontal and vertical reinforcement. The performance of XGBoost model outperforms traditional design models and reducing input features still yields reliable predictions.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

DHESN: A deep hierarchical echo state network approach for algal bloom prediction

Bo Hu, Huiyan Zhang, Xiaoyi Wang, Li Wang, Jiping Xu, Qian Sun, Zhiyao Zhao, Lei Zhang

Summary: A deep hierarchical echo state network (DHESN) is proposed to address the limitations of shallow coupled structures. By using transfer entropy, candidate variables with strong causal relationships are selected and a hierarchical reservoir structure is established to improve prediction accuracy. Simulation results demonstrate that DHESN performs well in predicting algal bloom.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Learning high-dependence Bayesian network classifier with robust topology

Limin Wang, Lingling Li, Qilong Li, Kuo Li

Summary: This paper discusses the urgency of learning complex multivariate probability distributions due to the increase in data variability and quantity. It introduces a highly scalable classifier called TAN, which utilizes maximum weighted spanning tree (MWST) for graphical modeling. The paper theoretically proves the feasibility of extending one-dependence MWST to model high-dependence relationships and proposes a heuristic search strategy to improve the fitness of the extended topology to data. Experimental results demonstrate that this algorithm achieves a good bias-variance tradeoff and competitive classification performance compared to other high-dependence or ensemble learning algorithms.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Make a song curative: A spatio-temporal therapeutic music transfer model for anxiety reduction

Zhejing Hu, Gong Chen, Yan Liu, Xiao Ma, Nianhong Guan, Xiaoying Wang

Summary: Anxiety is a prevalent issue and music therapy has been found effective in reducing anxiety. To meet the diverse needs of individuals, a novel model called the spatio-temporal therapeutic music transfer model (StTMTM) is proposed.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

A modified reverse-based analysis logic mining model with Weighted Random 2 Satisfiability logic in Discrete Hopfield Neural Network and multi-objective training of Modified Niched Genetic Algorithm

Nur Ezlin Zamri, Mohd. Asyraf Mansor, Mohd Shareduwan Mohd Kasihmuddin, Siti Syatirah Sidik, Alyaa Alway, Nurul Atiqah Romli, Yueling Guo, Siti Zulaikha Mohd Jamaludin

Summary: In this study, a hybrid logic mining model was proposed by combining the logic mining approach with the Modified Niche Genetic Algorithm. This model improves the generalizability and storage capacity of the retrieved induced logic. Various modifications were made to address other issues. Experimental results demonstrate that the proposed model outperforms baseline methods in terms of accuracy, precision, specificity, and correlation coefficient.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

On taking advantage of opportunistic meta-knowledge to reduce configuration spaces for automated machine learning

David Jacob Kedziora, Tien-Dung Nguyen, Katarzyna Musial, Bogdan Gabrys

Summary: The paper addresses the problem of efficiently optimizing machine learning solutions by reducing the configuration space of ML pipelines and leveraging historical performance. The experiments conducted show that opportunistic/systematic meta-knowledge can improve ML outcomes, and configuration-space culling is optimal when balanced. The utility and impact of meta-knowledge depend on various factors and are crucial for generating informative meta-knowledge bases.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Optimal location for an EVPL and capacitors in grid for voltage profile and power loss: FHO-SNN approach

G. Sophia Jasmine, Rajasekaran Stanislaus, N. Manoj Kumar, Thangamuthu Logeswaran

Summary: In the context of a rapidly expanding electric vehicle market, this research investigates the ideal locations for EV charging stations and capacitors in power grids to enhance voltage stability and reduce power losses. A hybrid approach combining the Fire Hawk Optimizer and Spiking Neural Network is proposed, which shows promising results in improving system performance. The optimization approach has the potential to enhance the stability and efficiency of electric grids.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

NLP-based approach for automated safety requirements information retrieval from project documents

Zhijiang Wu, Guofeng Ma

Summary: This study proposes a natural language processing-based framework for requirement retrieval and document association, which can help to mine and retrieve documents related to project managers' requirements. The framework analyzes the ontology relevance and emotional preference of requirements. The results show that the framework performs well in terms of iterations and threshold, and there is a significant matching between the retrieved documents and the requirements, which has significant managerial implications for construction safety management.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Dog nose-print recognition based on the shape and spatial features of scales

Yung-Kuan Chan, Chuen-Horng Lin, Yuan-Rong Ben, Ching-Lin Wang, Shu-Chun Yang, Meng-Hsiun Tsai, Shyr-Shen Yu

Summary: This study proposes a novel method for dog identification using nose-print recognition, which can be applied to controlling stray dogs, locating lost pets, and pet insurance verification. The method achieves high recognition accuracy through two-stage segmentation and feature extraction using a genetic algorithm.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Fostering supply chain resilience for omni-channel retailers: A two-phase approach for supplier selection and demand allocation under disruption risks

Shaohua Song, Elena Tappia, Guang Song, Xianliang Shi, T. C. E. Cheng

Summary: This study aims to optimize supplier selection and demand allocation decisions for omni-channel retailers in order to achieve supply chain resilience. It proposes a two-phase approach that takes into account various factors such as supplier evaluation and demand allocation.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

Accelerating Benders decomposition approach for shared parking spaces allocation considering parking unpunctuality and no-shows

Jinyan Hu, Yanping Jiang

Summary: This paper examines the allocation problem of shared parking spaces considering parking unpunctuality and no-shows. It proposes an effective approach using sample average approximation (SAA) combined with an accelerating Benders decomposition (ABD) algorithm to solve the problem. The numerical experiments demonstrate the significance of supply-demand balance for the operation and user satisfaction of the shared parking system.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Review Computer Science, Artificial Intelligence

Financial fraud detection using graph neural networks: A systematic review

Soroor Motie, Bijan Raahemi

Summary: Financial fraud is a persistent problem in the finance industry, but Graph Neural Networks (GNNs) have emerged as a powerful tool for detecting fraudulent activities. This systematic review provides a comprehensive overview of the current state-of-the-art technologies in using GNNs for financial fraud detection, identifies gaps and limitations in existing research, and suggests potential directions for future research.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Review Computer Science, Artificial Intelligence

Occluded person re-identification with deep learning: A survey and perspectives

Enhao Ning, Changshuo Wang, Huang Zhang, Xin Ning, Prayag Tiwari

Summary: This review provides a detailed overview of occluded person re-identification methods and conducts a systematic analysis and comparison of existing deep learning-based approaches. It offers important theoretical and practical references for future research in the field.

EXPERT SYSTEMS WITH APPLICATIONS (2024)

Article Computer Science, Artificial Intelligence

A hierarchical attention detector for bearing surface defect detection

Jiajun Ma, Songyu Hu, Jianzhong Fu, Gui Chen

Summary: The article presents a novel visual hierarchical attention detector for multi-scale defect location and classification, utilizing texture, semantic, and instance features of defects through a hierarchical attention mechanism, achieving multi-scale defect detection in bearing images with complex backgrounds.

EXPERT SYSTEMS WITH APPLICATIONS (2024)