☆ 4.2 Article

Offline evaluation options for recommender systems

INFORMATION RETRIEVAL JOURNAL (2020)

期刊

INFORMATION RETRIEVAL JOURNAL

卷 23, 期 4, 页码 387-410

出版社

SPRINGER

DOI: 10.1007/s10791-020-09371-3

关键词

Recommender systems; Evaluation; Effectiveness metric; Experimental design

类别

Computer Science, Information Systems

资金

Spanish Ministry of Science, Innovation and Universities [TIN2016-80630-P]

向作者/读者索取更多资源

Protocol

Reagent

摘要

We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

A novel evaluation framework for recommender systems in big data environments

Roberto Henriques, Luis Pinto

Summary: Recommender systems were initially introduced to address information overload in enterprises and have since been widely applied in major websites such as e-commerce, music and video streaming, travel and movie sites, social media, and mobile app stores. However, there is limited research on evaluating recommender systems. This study proposes a novel evaluation metric that incorporates user behavior in online platforms, aiming to improve customer usage beyond the baseline level. An empirical application in a real-world mobile app store demonstrates the correlation between this metric and existing ones, as well as its ability to integrate cost information.

EXPERT SYSTEMS WITH APPLICATIONS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A challenge for rounded evaluation of recommender systems

Jacopo Tagliabue, Federico Bianchi, Tobias Schnabel, Giuseppe Attanasio, Ciro Greco, Gabriel de Souza Moreira, Patrick John Chia

Summary: The organizers of the EvalRS recommender systems competition argue for the consideration of robustness and fairness in addition to accuracy.

NATURE MACHINE INTELLIGENCE (2023)

添加到收藏夹

Article Computer Science, Information Systems

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation

Elisa Mena-Maldonado, Rocio Canamares, Pablo Castells, Yongli Ren, Mark Sanderson

Summary: The study reveals that false-positive metrics tend to penalize popular items, opposite to true-positive metrics, resulting in an inconsistent trend between the two types of metrics under popularity biases. Through theoretical analysis, the reasons for the disagreement between metrics are identified, as well as rare situations where they might agree. Empirical study results confirm the analytical findings, guiding researchers on when to use true-positive or false-positive metrics in offline evaluation of recommender systems.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Statistically Robust Evaluation of Stream-Based Recommender Systems

Joao Vinagre, Alipio Mario Jorge, Conceicao Rocha, Joao Gama

Summary: The article proposes a statistical validation method for online recommendation algorithms evaluation, and experiments show that this method is effective in streaming data environment.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

添加到收藏夹

Review Health Care Sciences & Services

Development and Evaluation of Health Recommender Systems: Systematic Scoping Review and Evidence Mapping

Yue Sun, Jia Zhou, Mengmeng Ji, Lusi Pei, Zhiwen Wang

Summary: This study aimed to identify and evaluate the development of Health Recommender Systems (HRSs) and create an evidence map. A total of 51 studies were included for data extraction. The findings showed that only 19.6% of the systems considered the personal preferences of end users in the design stage. The evaluation methods varied, with 62.7% of the studies using offline evaluations and 33.3% including end users in the evaluation process. More user-centered evaluation studies are needed in the future.

JOURNAL OF MEDICAL INTERNET RESEARCH (2023)

添加到收藏夹

Article Computer Science, Information Systems

Effective and efficient negative sampling in metric learning based recommendation

Junha Park, Yeon-Chang Lee, Sang-Wook Kim

Summary: This paper addresses the issue of negative sampling strategy in metric learning-based recommendation methods, introducing the cage-based NS strategy (CNS) and its improved version CNS+. Through experiments, it is demonstrated that both strategies significantly improve accuracy and reduce computation overhead, with linear scalability as the number of ratings increases. Applying CNS strategy to existing ML recommendation methods consistently enhances their accuracy, while CNS+ significantly reduces execution times without sacrificing accuracy.

INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

AFramework for Evaluating Personalized Ranking Systems by Fusing Different Evaluation Measures

Tome Eftimov, Bibek Paudel, Gorjan Popovski, Dragi Kocev

Summary: Personalized ranking systems, also known as recommender systems, employ various big data methods, but existing performance measures do not effectively assist end-users in selecting suitable algorithms. To address this issue, we introduce a novel benchmarking framework that combines different evaluation measures to rank recommender systems on individual benchmark datasets.

BIG DATA RESEARCH (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Performance Evaluation of Aggregation-based Group Recommender Systems for Ephemeral Groups

Edgar Ceh-Varela, Huiping Cao, Hady W. Lauw

Summary: Recommender systems play an important role in decision-making processes, and there is a need for recommendations for groups in various real-world activities. However, evaluating the performance of Group Recommender Systems (GRecSys) proves challenging due to different types of groups and items. This study experimentally compares eight representative GRecSys and finds that those using Singular Value Decomposition or Neural Collaborative Filtering methods perform better, with the Average aggregation function yielding better results.

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Detect Professional Malicious User With Metric Learning in Recommender Systems

Yuanbo Xu, Yongjian Yang, En Wang, Fuzhen Zhuang, Hui Xiong

Summary: This paper addresses the issue of professional malicious users in e-commerce and proposes an unsupervised multi-modal learning model to detect these users. The model considers both ratings and reviews, and utilizes metric learning and attention mechanism to improve performance. Experimental results demonstrate its effectiveness, and it also enhances the performance of recommender models.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

添加到收藏夹

Review Computer Science, Information Systems

Similarity measures for Collaborative Filtering-based Recommender Systems: Review and experimental comparison

Fethi Fkih

Summary: This paper provides an in-depth review of similarity measures used in collaborative filtering-based recommender systems. Through experimental studies, the performance of different measures is compared, and important conclusions are drawn. Evaluation results show that different similarity measures have different suitability in user-based and item-based recommendations.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

A Unified Collaborative Representation Learning for Neural-Network Based Recommender Systems

Yuanbo Xu, En Wang, Yongjian Yang, Yi Chang

Summary: The study introduces a novel supervised collaborative representation learning model called Magnetic Metric Learning (MML), which leverages dual triplets to handle both observed and underlying relationships between users and items, addressing some of the drawbacks in traditional recommender system approaches.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

添加到收藏夹

Review Computer Science, Hardware & Architecture

Serendipity in Recommender Systems: A Systematic Literature Review

Reza Jafari Ziarani, Reza Ravanmehr

Summary: A recommender system is used to accurately recommend items to attract users, but focusing too much on accuracy can lead to boring and predictable recommendations. Novelty and diversity are helpful, but serendipity, unexpectedness, and relevance are also important criteria for appealing and useful recommendations. Recent studies have shown progress in the quality and quantity of articles on serendipity-oriented recommender systems.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY (2021)

添加到收藏夹

Article Information Science & Library Science

When Variety Seeking Meets Unexpectedness: Incorporating Variety-Seeking Behaviors into Design of Unexpected Recommender Systems

Pan Li, Alexander Tuzhilin

Summary: This paper proposes a variety-seeking framework to measure the level of variety-seeking behavior of customers in recommendations based on their consumption records. The effectiveness of the framework is validated through user questionnaire studies conducted at Alibaba. Furthermore, a recommendation framework that combines the identified variety-seeking levels with unexpected recommender systems is presented. Experimental results demonstrate the effectiveness and economic impact of the recommendation framework, providing important managerial implications.

INFORMATION SYSTEMS RESEARCH (2023)

添加到收藏夹

Article Computer Science, Theory & Methods

Survey on the Objectives of Recommender Systems: Measures, Solutions, Evaluation Methodology, and New Perspectives

Bushra Alhijawi, Arafat Awajan, Salam Fraihat

Summary: This article provides a comprehensive review of recent research efforts on recommender systems. It focuses on objectives beyond accuracy and relevance, such as diversity, novelty, coverage, and serendipity. The article also explores the definitions and measures associated with these objectives, as well as the evaluation methodology and new applications in the field.

ACM COMPUTING SURVEYS (2023)

添加到收藏夹

Article Mathematics

Design of Confidence-Integrated Denoising Auto-Encoder for Personalized Top-N Recommender Systems

Zeshan Aslam Khan, Naveed Ishtiaq Chaudhary, Waqar Ali Abbasi, Sai Ho Ling, Muhammad Asif Zahoor Raja

Summary: A recommender system aims to gain users' confidence and reduce their time and effort. In this study, an improved, confidence-integrated denoising auto-encoder (DAE) is proposed to enhance the performance of recommender systems. The proposed model achieves improved scores in various evaluation metrics and proves to be efficient and accurate in generating recommendations.

MATHEMATICS (2023)

添加到收藏夹

Article Computer Science, Information Systems

Large-Alphabet Semi-Static Entropy Coding Via Asymmetric Numeral Systems

Alistair Moffat, Matthias Petri

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2020)

添加到收藏夹

Editorial Material Computer Science, Information Systems

Guest editorial: special issue on ECIR 2020

Joemon M. Jose, Emine Yilmaz, Joao Magalhaes, Pablo Castells

INFORMATION RETRIEVAL JOURNAL (2021)

添加到收藏夹

Article Computer Science, Information Systems

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation

Elisa Mena-Maldonado, Rocio Canamares, Pablo Castells, Yongli Ren, Mark Sanderson

Summary: The study reveals that false-positive metrics tend to penalize popular items, opposite to true-positive metrics, resulting in an inconsistent trend between the two types of metrics under popularity biases. Through theoretical analysis, the reasons for the disagreement between metrics are identified, as well as rare situations where they might agree. Empirical study results confirm the analytical findings, guiding researchers on when to use true-positive or false-positive metrics in offline evaluation of recommender systems.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2021)

添加到收藏夹

Article Computer Science, Information Systems

Modeling search and session effectiveness

Alfan Farizki Wicaksono, Alistair Moffat

Summary: This paper introduces a session-based offline evaluation framework for measuring the overall usefulness of search sessions. By modeling data from two commercial search engines, the user conditional continuation probability and user conditional reformulation probability are proposed to develop new metrics that show greater correlation with observed user behavior during search sessions.

INFORMATION PROCESSING & MANAGEMENT (2021)

添加到收藏夹

Article Computer Science, Information Systems

Anytime Ranking on Document-Ordered Indexes

Joel Mackenzie, Matthias Petri, Alistair Moffat

Summary: Inverted indexes are crucial for efficient querying of large document collections in text search engines. Document-ordered indexes are common and allow for various query types, but they may scatter high-scoring documents. Impact-ordered indexes, on the other hand, support anytime query processing and improve search quality gradually.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Information Systems

Entropic relevance: A mechanism for measuring stochastic process models discovered from event data

Hanan Alkhammash, Artem Polyvyanyy, Alistair Moffat, Luciano Garcia-Banuelos

Summary: Access to large volumes of data is crucial for developing precise models in various fields of computing. This article introduces the entropic relevance measure for conformance checking of stochastic process models, which provides information about the likelihood and patterns of process events.

INFORMATION SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Offline recommender system evaluation: Challenges and new directions

Pablo Castells, Alistair Moffat

Summary: This article reviews and reflects on the development and current status of recommender system evaluation, with a focus on offline evaluation. It discusses the challenges posed by adaptation and specific needs, as well as important choices in experiment configuration and broader perspectives of evaluation.

AI MAGAZINE (2022)

添加到收藏夹

Article Computer Science, Information Systems

Efficient immediate-access dynamic indexing

Alistair Moffat, Joel Mackenzie

Summary: This paper introduces an index structure and processing regime that allows immediate access and efficient querying in a dynamic retrieval system. A new compression operation and extensible lists approach are described to achieve incremental document-level indexing and fast document insertion. The mechanism supports various types of queries and facilitates conversion to a compressed inverted index structure.

INFORMATION PROCESSING & MANAGEMENT (2023)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Bootstrapping Generalization of Process Models Discovered from Event Data

Artem Polyvyanyy, Alistair Moffat, Luciano Garcia-Banuelos

Summary: This paper proposes a bootstrap-based estimator for process mining generalization, which reduces errors when the quality of the log improves and supports industry-scale data-driven systems engineering.

ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Evaluation of Herd Behavior Caused by Population-scale Concept Drift in Collaborative Filtering

Chenglong Ma, Yongli Ren, Pablo Castells, Mark Sanderson

Summary: This article discusses the problem of concept drift in stream data, particularly the temporal dynamics of user behavior observed in recommender systems. The authors find that irrational behavior by users can impair the knowledge learned by recommender algorithms and exacerbate preference biases. However, existing research often focuses on individual concept drift and overlooks the synergistic effect among users in the same social group. The authors conduct a study on user behavior to detect collaborative concept drift among users and empirically show that increasing individual experience can weaken herd effects.

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

RELISON: A Framework for Link Recommendation in Social Networks

Javier Sanz-Cruzado, Pablo Castells

Summary: Link recommendation is a significant problem at the crossroad of recommender systems and online social networks. We introduce RELISON, an extensible framework for conducting link recommendation experiments, which includes various algorithms and evaluation tools.

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Human Preferences as Dueling Bandits

Xinyi Yan, Chengxi Luo, Charles L. A. Clarke, Nick Craswell, Ellen M. Voorhees, Pablo Castells

Summary: The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. Several recent papers explore pairwise preference judgments as an alternative to traditional graded relevance assessments. This method allows for finegrained distinctions by comparing items side-by-side.

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

SimuRec: Workshop on Synthetic Data and Simulation Methods for Recommender Systems Research

Michael D. Ekstrand, Allison Chaney, Pablo Castells, Robin Burke, David Rohde, Manel Slokom

Summary: There is a growing interest in using synthetic data and simulation infrastructures for recommender systems research, but there are currently no clear best practices in this area. A workshop was proposed to discuss the state of the art in this research and address methodological questions, resulting in a report documenting currently-known best practices and outlining an agenda for further research.

15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021) (2021)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.