4.7 Article

Exploiting syntactic and neighbourhood attributes to address cold start in tag recommendation

Journal

INFORMATION PROCESSING & MANAGEMENT
Volume 56, Issue 3, Pages 771-790

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2018.12.009

Keywords

Tag recommendation; Syntactic patterns; NLP; Nearest neighbors

Funding

  1. Google
  2. Brazilian National Institute of Science and Technology for Web Research (MCT/CNPq/INCT Web Grant) [573871/2008-6]
  3. FAPEMIG-PRONEX-MASWeb project - Models, Algorithms and Systems for the Web [APQ-01400-14]
  4. CNPq
  5. CAPES
  6. FAPEMIG

Ask authors/readers for more resources

Many state-of-the-art tag recommendation methods were designed considering that an initial set of tags is available in the target object. However, the effectiveness of these methods greatly suffer in a cold start scenario in which those initial tags are absent (although other features of the target object, such as title and description, may be present). To tackle this problem, previous work extracts candidate terms directly from the text associated with the target object or from similar/related objects, and use statistical properties of the occurrence of words, such as term frequency (TF) and inverse document frequency (IDF), to rank the candidate tags for recommendation. Yet, these properties, in isolation, may not be enough to effectively rank candidate tags, specially when they are extracted from the typically small and possibly low quality texts associated with Web 2.0 objects. In this work, we analyze various syntactic patterns (e.g., syntactic dependencies between words in a sentence) of the text associated with Web 2.0 objects that can be exploited to identify and recommend tags. We also propose new tag quality attributes based on these patterns, including them as new evidence to be exploited by state-of-the-art Learning-to-Rank (L2R) based tag recommenders. We evaluate our tag recommendation methods using real data from four Web 2.0 applications, finding that, for three out of our four datasets, the inclusion of our new proposed syntactic tag quality attributes brings improvements to two L2R-based tag recommenders with gains of up to 17% in precision. Furthermore, we find that recommendations provided by these methods can be further expanded exploiting the target object's neighbourhood (i.e., similar objects). Our characterization and feature importance analysis results show that our syntactic attributes can indeed help discriminate relevant from non-relevant tags, being complementary to other, more traditional, tag quality attributes, particularly for datasets in which the textual features are short and / or present low quality.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Information Systems

Bag of Textual Graphs (BoTG): A General Graph-Based Text Representation Model

Icaro Cavalcante Dourado, Renata Galante, Marcos Andre Goncalves, Ricardo da Silva Torres

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY (2019)

Article Computer Science, Information Systems

10SENT: A stable sentiment analysis method based on the combination of off-the-shelf approaches

Philipe F. Melo, Daniel H. Dalip, Manoel M. Junior, Marcos A. Goncalves, Fabricio Benevenuto

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY (2019)

Article Computer Science, Information Systems

Risk-Sensitive Learning to Rank with Evolutionary Multi-Objective Feature Selection

Daniel Xavier Sousa, Sergio Canuto, Marcos Andre Goncalves, Thierson Couto Rosa, Wellington Santos Martins

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2019)

Article Computer Science, Information Systems

Fine-grained tourism prediction: Impact of social and environmental features

Amir Khatibi, Fabiano Belem, Ana Paula Couto da Silva, Jussara M. Almeida, Marcos A. Goncalves

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Information Systems

Extended pre-processing pipeline for text classification: On the role of meta-feature representations, sparsification and selective sampling

Washington Cunha, Sergio Canuto, Felipe Viegas, Thiago Salles, Christian Gomes, Vitor Mangaravite, Elaine Resende, Thierson Rosa, Marcos Andre Goncalves, Leonardo Rocha

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Statistics & Probability

A bias-variance analysis of state-of-the-art random forest text classifiers

Thiago Salles, Leonardo Rocha, Marcos Goncalves

Summary: The study analyzed variants of random forest (RF) classifiers in the case of noisy data, exploring the bias-variance decomposition of error rate and showing significant improvements in variance and bias stability for lazy and boosted RF variants. The research provides promising directions for further enhancements in RF-based learners.

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION (2021)

Article Computer Science, Information Systems

Fixing the curse of the bad product descriptions - Search-boosted tag recommendation for E-commerce products

Fabiano M. Belem, Rodrigo M. Silva, Claudio M. de Andrade, Gabriel Person, Felipe Mingote, Raphael Ballet, Helton Alponti, Henrique P. de Oliveira, Jussara M. Almeida, Marcos A. Goncalves

INFORMATION PROCESSING & MANAGEMENT (2020)

Article Computer Science, Information Systems

Exploiting semantic relationships for unsupervised expansion of sentiment lexicons

Felipe Viegas, Mario S. Alvim, Sergio Canuto, Thierson Rosa, Marcos Andre Goncalves, Leonardo Rocha

INFORMATION SYSTEMS (2020)

Article Computer Science, Information Systems

On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study

Washington Cunha, Vitor Mangaravite, Christian Gomes, Sergio Canuto, Elaine Resende, Cecilia Nascimento, Felipe Viegas, Celso Franca, Wellington Santos Martins, Jussara M. Almeida, Thierson Rosa, Leonardo Rocha, Marcos Andre Goncalves

Summary: This article brings two major contributions. Firstly, it critically analyses recent scientific articles about different approaches for automatic text classification, revealing potential issues related to experimental procedures. Secondly, it provides a comparison between neural and non-neural ATC solutions, showing that simpler non-neural methods perform well in smaller datasets, while neural Transformers are better in larger datasets. However, the gains in effectiveness of neural methods are not significant compared to properly tuned non-neural solutions.

INFORMATION PROCESSING & MANAGEMENT (2021)

Review Health Care Sciences & Services

Impact of Big Data Analytics on People's Health: Overview of Systematic Reviews and Recommendations for Future Studies

Israel Junior Borges do Nascimento, Milena Soriano Marcolino, Hebatullah Mohamed Abdulazeem, Ishanka Weerasekara, Natasha Azzopardi-Muscat, Marcos Andre Goncalves, David Novillo-Ortiz

Summary: The study aimed to assess the impact of big data analytics on people's health, focusing on improving the accuracy of diagnosis for certain diseases, managing chronic diseases, and supporting real-time analysis of large, varied data inputs for disease prediction and diagnosis.

JOURNAL OF MEDICAL INTERNET RESEARCH (2021)

Article Computer Science, Information Systems

Individualized extreme dominance (IndED): A new preference-based method for multi-objective recommender systems

Reinaldo Silva Fortes, Daniel Xavier de Sousa, Dayanne G. Coelho, Anisio M. Lacerda, Marcos A. Goncalves

Summary: The study introduces a new preference-based multi-objective recommendation method, IndED, which better satisfies individual user preferences and balances objectives more effectively. By utilizing the concepts of extreme dominance and statistical significance tests, IndED defines a new Pareto-based dominance relation to guide optimization search based on user preferences.

INFORMATION SCIENCES (2021)

Article Multidisciplinary Sciences

FISETIO: A FIne-grained, Structured and Enriched Tourism Dataset for Indoor and Outdoor attractions

Amir Khatibi, Ana Paula Couto da Silva, Jussara M. Almeida, Marcos A. Goncalves

DATA IN BRIEF (2020)

Article Information Science & Library Science

A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information

Gustavo Oliveira de Siqueira, Sergio Canuto, Marcos Andre Goncalves, Alberto H. F. Laender

INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES (2020)

Proceedings Paper Computer Science, Information Systems

Automatic Generation of Initial Reading Lists: Requirements and Solutions

Pablo Figueira, Fabiano Belem, Jussara M. Almeida, Marcos A. Goncalves

2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019) (2019)

Proceedings Paper Computer Science, Artificial Intelligence

CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling

Felipe Viegas, Sergio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, Marcos Andre Goncalves

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19) (2019)

Article Computer Science, Information Systems

The social-technological ways to develop digital entrepreneurship: Targeting value creation and value capture

Sang-Bing Tsai, Xusen Cheng, Yanwu Yang, Jason Xiong, Alex Zarifis

Summary: This article structurally concludes the methods proposed and evidenced to develop digital entrepreneurship from a socio-technical perspective. The technology itself and the process of utilization should be carefully considered. From a social perspective, fulfilling the needs of customers in social interaction and nurturing characteristics and social skills for the digital work environment are crucial.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

NSEP: Early fake news detection via news semantic environment perception

Xiaochang Fang, Hongchen Wu, Jing Jing, Yihong Meng, Bing Yu, Hongzhu Yu, Huaxiang Zhang

Summary: This study proposes a novel fake news detection framework, utilizing news semantic environment perception (NSEP) to identify fake news content. The framework consists of steps such as dividing the semantic environment into macro and micro levels, applying graph convolutional networks, and utilizing multihead attention. Empirical experiments show that the NSEP framework achieves high accuracy in detecting Chinese fake news, outperforming other baseline methods and highlighting the importance of both micro and macro semantic environments in early detection of fake news.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

A scalable and flexible basket analysis system for big transaction data in Spark

Xudong Sun, Alladoumbaye Ngueilbaye, Kaijing Luo, Yongda Cai, Dingming Wu, Joshua Zhexue Huang

Summary: This paper proposes a scalable distributed frequent itemset mining (ScaDistFIM) algorithm to address the data scalability and flexibility issues in basket analysis in the big data era. Experiment results demonstrate that the ScaDistFIM algorithm is more efficient compared to the Spark FP-Growth algorithm.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

A T5-based interpretable reading comprehension model with more accurate evidence training

Boxu Guan, Xinhua Zhu, Shangbo Yuan

Summary: This paper aims to improve the interpretability of machine reading comprehension models by utilizing the pre-trained T5 model for evidence inference. They propose an interpretable reading comprehension model based on T5, which is trained on a more accurate evidence corpus and can infer precise interpretations for answers. Experimental results show that their model outperforms the baseline BERT model on the SQuAD1.1 task.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

STMAP: A novel semantic text matching model augmented with embedding perturbations

Yanhao Wang, Baohua Zhang, Weikang Liu, Jiahao Cai, Huaping Zhang

Summary: In this study, we propose a data augmentation-based semantic text matching model called STMAP. By using Gaussian noise and noise mask signal for data augmentation, as well as employing an adaptive optimization network for training target optimization, our model achieves good performance in few-shot learning and semantic deviation problems.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

An efficient loss function and deep learning approach for ranking stock returns in the absence of prior knowledge

Jiahao Yang, Shuo Feng, Wenkai Zhang, Ming Zhang, Jun Zhou, Pengyuan Zhang

Summary: To pursue profit from stock markets, researchers utilize deep learning methods to forecast asset price movements. However, there are two issues in current research, the discrepancy between forecasting results and profits, and heavy reliance on prior knowledge. To address these issues, researchers propose a novel optimization objective and modeling method, and conduct experiments to validate their approach.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

Revealing the technology development of natural language processing: A Scientific entity-centric perspective

Heng Zhang, Chengzhi Zhang, Yuzhuo Wang

Summary: This study provides an accurate analysis of technology development in the field of Natural Language Processing (NLP) from an entity-centric perspective. The findings indicate an increase in the average number of entities per paper, with pre-trained language models becoming mainstream and the impact of Wikipedia dataset and BLEU metric continuing to rise. There has been a surge in popularity for new high-impact technologies in recent years, with researchers accepting them at an unprecedented speed.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

Citation prediction by leveraging transformers and natural language heuristics

Davide Buscaldi, Danilo Dessi, Enrico Motta, Marco Murgia, Francesco Osborne, Diego Reforgiato Recupero

Summary: In scientific papers, citing other articles is a common practice to support claims and provide evidence. This paper proposes two automatic methods using Transformer models to address citation placement, and achieves significant improvements in experiments.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

Data-driven analysis of digital entrepreneurship in medical supply resilience confronting the COVID-19 epidemic

Baozhuang Niu, Lingfeng Wang, Xinhu Yu, Beibei Feng

Summary: This paper examines whether the incumbent brand should adopt digital technology to forecast demand and adjust order decisions in the face of soaring demand for medical supply caused by frequent outbreaks of regional COVID-19 epidemic. The study finds that digital transformation can lead to a triple-win situation among the incumbent brand, social welfare, and consumer surplus, as well as bring benefits to the manufacturer. Furthermore, the research provides insights for firms' digital entrepreneurship decisions through theoretical optimization and data processing/policy simulation.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

Multi-level knowledge-driven feature representation and triplet loss optimization network for image-text retrieval

Xueyang Qin, Lishang Li, Fei Hao, Meiling Ge, Guangyao Pang

Summary: Image-text retrieval is important in connecting vision and language. This paper proposes a method that utilizes prior knowledge to enhance feature representations and optimize network training for better retrieval results.

INFORMATION PROCESSING & MANAGEMENT (2024)

Review Computer Science, Information Systems

A co-attention based multi-modal fusion network for review helpfulness prediction

Gang Ren, Lei Diao, Fanjia Guo, Taeho Hong

Summary: This paper proposes a novel approach for predicting the helpfulness of reviews by utilizing both textual and image features. The proposed method considers the correlation between features through self-attention and co-attention mechanisms, and fuses multi-modal features for prediction. Experimental results demonstrate the superior performance of the proposed method compared to benchmark methods.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

Retrieval Contrastive Learning for Aspect-Level Sentiment Classification

Zhongquan Jian, Jiajian Li, Qingqiang Wu, Junfeng Yao

Summary: Aspect-Level Sentiment Classification (ALSC) is a crucial challenge in Natural Language Processing (NLP). Most existing methods fail to consider the correlations between different instances, leading to a lack of global viewpoint. To address this issue, we propose a Retrieval Contrastive Learning (RCL) framework that extracts intrinsic knowledge across instances for improved instance representation. Experimental results demonstrate that training ALSC models with RCL leads to substantial performance improvements.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

A hierarchical convolutional model for biomedical relation extraction

Ying Hu, Yanping Chen, Ruizhang Huang, Yongbin Qin, Qinghua Zheng

Summary: Biomedical relation extraction aims to extract the interactive relations between biomedical entities in a sentence. This study proposes a hierarchical convolutional model to address the semantic overlapping and data imbalance problems. The model encodes both local contextual features and global semantic dependencies, enhancing the discriminability of the neural network for biomedical relation extraction.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

Topic Audiolization: A Model for Rumor Detection Inspired by Lie Detection Technology

Zhou Yang, Yucai Pang, Xuehong Li, Qian Li, Shihong Wei, Rong Wang, Yunpeng Xiao

Summary: This study proposes a rumor detection model based on topic audiolization, which transforms the topic space into audio-like signals. Experimental results show that the model achieves significant performance improvements in rumor identification.

INFORMATION PROCESSING & MANAGEMENT (2024)

Article Computer Science, Information Systems

User-oriented metrics for search engine deterministic sort orders

Alistair Moffat

Summary: This paper proposes the buying power metric for assessing the quality of product rankings on e-commerce sites. It discusses the relationship between the buying power metric and user reactions, and introduces an alternative product ranking effectiveness metric.

INFORMATION PROCESSING & MANAGEMENT (2024)