4.2 Article

Offline evaluation options for recommender systems

期刊

INFORMATION RETRIEVAL JOURNAL
卷 23, 期 4, 页码 387-410

出版社

SPRINGER
DOI: 10.1007/s10791-020-09371-3

关键词

Recommender systems; Evaluation; Effectiveness metric; Experimental design

资金

  1. Spanish Ministry of Science, Innovation and Universities [TIN2016-80630-P]

向作者/读者索取更多资源

We undertake a detailed examination of the steps that make up offline experiments for recommender system evaluation, including the manner in which the available ratings are filtered and split into training and test; the selection of a subset of the available users for the evaluation; the choice of strategy to handle the background effects that arise when the system is unable to provide scores for some items or users; the use of either full or condensed output lists for the purposes of scoring; scoring methods themselves, including alternative top-weighted mechanisms for condensed rankings; and the application of statistical testing on a weighted-by-user or weighted-by-volume basis as a mechanism for providing confidence in measured outcomes. We carry out experiments that illustrate the impact that each of these choice points can have on the usefulness of an end-to-end system evaluation, and provide examples of possible pitfalls. In particular, we show that varying the split between training and test data, or changing the evaluation metric, or how target items are selected, or how empty recommendations are dealt with, can give rise to comparisons that are vulnerable to misinterpretation, and may lead to different or even opposite outcomes, depending on the exact combination of settings used.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Information Systems

Large-Alphabet Semi-Static Entropy Coding Via Asymmetric Numeral Systems

Alistair Moffat, Matthias Petri

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2020)

Editorial Material Computer Science, Information Systems

Guest editorial: special issue on ECIR 2020

Joemon M. Jose, Emine Yilmaz, Joao Magalhaes, Pablo Castells

INFORMATION RETRIEVAL JOURNAL (2021)

Article Computer Science, Information Systems

Popularity Bias in False-positive Metrics for Recommender Systems Evaluation

Elisa Mena-Maldonado, Rocio Canamares, Pablo Castells, Yongli Ren, Mark Sanderson

Summary: The study reveals that false-positive metrics tend to penalize popular items, opposite to true-positive metrics, resulting in an inconsistent trend between the two types of metrics under popularity biases. Through theoretical analysis, the reasons for the disagreement between metrics are identified, as well as rare situations where they might agree. Empirical study results confirm the analytical findings, guiding researchers on when to use true-positive or false-positive metrics in offline evaluation of recommender systems.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2021)

Article Computer Science, Information Systems

Modeling search and session effectiveness

Alfan Farizki Wicaksono, Alistair Moffat

Summary: This paper introduces a session-based offline evaluation framework for measuring the overall usefulness of search sessions. By modeling data from two commercial search engines, the user conditional continuation probability and user conditional reformulation probability are proposed to develop new metrics that show greater correlation with observed user behavior during search sessions.

INFORMATION PROCESSING & MANAGEMENT (2021)

Article Computer Science, Information Systems

Anytime Ranking on Document-Ordered Indexes

Joel Mackenzie, Matthias Petri, Alistair Moffat

Summary: Inverted indexes are crucial for efficient querying of large document collections in text search engines. Document-ordered indexes are common and allow for various query types, but they may scatter high-scoring documents. Impact-ordered indexes, on the other hand, support anytime query processing and improve search quality gradually.

ACM TRANSACTIONS ON INFORMATION SYSTEMS (2022)

Article Computer Science, Information Systems

Entropic relevance: A mechanism for measuring stochastic process models discovered from event data

Hanan Alkhammash, Artem Polyvyanyy, Alistair Moffat, Luciano Garcia-Banuelos

Summary: Access to large volumes of data is crucial for developing precise models in various fields of computing. This article introduces the entropic relevance measure for conformance checking of stochastic process models, which provides information about the likelihood and patterns of process events.

INFORMATION SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Offline recommender system evaluation: Challenges and new directions

Pablo Castells, Alistair Moffat

Summary: This article reviews and reflects on the development and current status of recommender system evaluation, with a focus on offline evaluation. It discusses the challenges posed by adaptation and specific needs, as well as important choices in experiment configuration and broader perspectives of evaluation.

AI MAGAZINE (2022)

Article Computer Science, Information Systems

Efficient immediate-access dynamic indexing

Alistair Moffat, Joel Mackenzie

Summary: This paper introduces an index structure and processing regime that allows immediate access and efficient querying in a dynamic retrieval system. A new compression operation and extensible lists approach are described to achieve incremental document-level indexing and fast document insertion. The mechanism supports various types of queries and facilitates conversion to a compressed inverted index structure.

INFORMATION PROCESSING & MANAGEMENT (2023)

Proceedings Paper Computer Science, Information Systems

Bootstrapping Generalization of Process Models Discovered from Event Data

Artem Polyvyanyy, Alistair Moffat, Luciano Garcia-Banuelos

Summary: This paper proposes a bootstrap-based estimator for process mining generalization, which reduces errors when the quality of the log improves and supports industry-scale data-driven systems engineering.

ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2022) (2022)

Proceedings Paper Computer Science, Information Systems

Evaluation of Herd Behavior Caused by Population-scale Concept Drift in Collaborative Filtering

Chenglong Ma, Yongli Ren, Pablo Castells, Mark Sanderson

Summary: This article discusses the problem of concept drift in stream data, particularly the temporal dynamics of user behavior observed in recommender systems. The authors find that irrational behavior by users can impair the knowledge learned by recommender algorithms and exacerbate preference biases. However, existing research often focuses on individual concept drift and overlooks the synergistic effect among users in the same social group. The authors conduct a study on user behavior to detect collaborative concept drift among users and empirically show that increasing individual experience can weaken herd effects.

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) (2022)

Proceedings Paper Computer Science, Information Systems

RELISON: A Framework for Link Recommendation in Social Networks

Javier Sanz-Cruzado, Pablo Castells

Summary: Link recommendation is a significant problem at the crossroad of recommender systems and online social networks. We introduce RELISON, an extensible framework for conducting link recommendation experiments, which includes various algorithms and evaluation tools.

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) (2022)

Proceedings Paper Computer Science, Information Systems

Human Preferences as Dueling Bandits

Xinyi Yan, Chengxi Luo, Charles L. A. Clarke, Nick Craswell, Ellen M. Voorhees, Pablo Castells

Summary: The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. Several recent papers explore pairwise preference judgments as an alternative to traditional graded relevance assessments. This method allows for finegrained distinctions by comparing items side-by-side.

PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

SimuRec: Workshop on Synthetic Data and Simulation Methods for Recommender Systems Research

Michael D. Ekstrand, Allison Chaney, Pablo Castells, Robin Burke, David Rohde, Manel Slokom

Summary: There is a growing interest in using synthetic data and simulation infrastructures for recommender systems research, but there are currently no clear best practices in this area. A workshop was proposed to discuss the state of the art in this research and address methodological questions, resulting in a report documenting currently-known best practices and outlining an agenda for further research.

15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021) (2021)

暂无数据