4.5 Article

Quality-aware similarity assessment for entity matching in Web data

期刊

INFORMATION SYSTEMS
卷 37, 期 4, 页码 336-351

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.is.2011.09.007

关键词

Entity matching; Web; Similarity functions; Person name disambiguation; Twitter message classification

资金

  1. European Commission [FP7-ICT-256955]

向作者/读者索取更多资源

One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named entities, such as persons, companies, geographic locations, in Web documents. The names of these extracted entities are, however, non-unique; the same name on different Web pages might or might not refer to the same entity. The entity matching problem concerns of identifying the entities, which are referring to the same real-world entity. This problem is very similar to the entity resolution problem studied in relational databases, however, there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the entities. Similarity functions try to capture the degree of belief about the equivalence of two entities, thus they play a crucial role in entity matching. The accuracy of the similarity functions highly depends on the applied assessment techniques, but also on some specific features of the entities. We propose systematic design strategies for combined similarity functions in this context. Our method relies on the combination of multiple evidences, with the help of estimated quality of the individual similarity values and with particular attention to missing information that is common in Web context. We study the effectiveness of our method in two specific instances of the general entity matching problem, namely the person name disambiguation and the Twitter message classification problem. In both cases, using our techniques in a very simple algorithmic framework we obtained better results than the state-of-the-art methods. (C) 2011 Elsevier Ltd All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
Article Computer Science, Information Systems

Measuring rule-based LTLf process specifications: A probabilistic data-driven approach

Alessio Cecconi, Luca Barbaro, Claudio Di Ciccio, Arik Senderovich

Summary: This paper introduces a framework for designing probabilistic measures for declarative process specifications, which can assess the degree of compliance between process data and specifications. Through experiments, the applicability of the approach for various process mining tasks is demonstrated.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

A Value Co-Creation Perspective on Data Labeling in Hybrid Intelligence Systems: A Design Study

Mahei Manhai Li, Philipp Reinhard, Christoph Peters, Sarah Oeste-Reiss, Jan Marco Leimeister

Summary: This article introduces a novel human-in-the-loop (HIL) design for ITSM support ticket recommendations by incorporating a value co-creation perspective. The design incentivizes ITSM agents to provide labels during their everyday ticket-handling procedures, and the evaluation shows that recommendations after label improvement have increased user ratings.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

A survey of approaches for event sequence analysis and visualization

Anton Yeshchenko, Jan Mendling

Summary: This paper presents the development of event sequence data analysis techniques in different fields and proposes an integrated framework to facilitate collaboration and research synergy across various domains.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

Adoption of IT solutions: A data-driven analysis approach

Iris Reinhartz-Berger, Alan Hartman, Doron Kliger

Summary: Many IT departments provide solutions that partially meet the needs of business units. This research aims to develop a data-driven analysis method to support the selection of solutions with higher prospects of adoption and identify design gaps and barriers.

INFORMATION SYSTEMS (2024)

Article Computer Science, Information Systems

Discovery, simulation, and optimization of business processes with differentiated resources

Orlenys Lopez-Pintado, Marlon Dumas, Jonas Berx

Summary: Business process simulation is a versatile technique that predicts the impact of changes on process performance. However, previous approaches have limitations due to their treatment of resources as undifferentiated entities. This article addresses this issue by proposing a new simulation approach that treats each resource as an individual entity with its own performance and availability. The article also presents methods for discovering simulation models with differentiated resources and optimizing resource availability calendars. Empirical evaluation demonstrates that differentiated resource models better replicate cycle time distributions and work rhythm, and iterative optimization of resource allocations and calendars leads to improved cost-time tradeoffs.

INFORMATION SYSTEMS (2024)