☆ 4.5 Article

Quality-aware similarity assessment for entity matching in Web data

INFORMATION SYSTEMS (2012)

期刊

INFORMATION SYSTEMS

卷 37, 期 4, 页码 336-351

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.is.2011.09.007

关键词

Entity matching; Web; Similarity functions; Person name disambiguation; Twitter message classification

类别

Computer Science, Information Systems

资金

European Commission [FP7-ICT-256955]

向作者/读者索取更多资源

Protocol

Reagent

摘要

One of the key challenges to realize automated processing of the information on the Web, which is the central goal of the Semantic Web, is related to the entity matching problem. There are a number of tools that reliably recognize named entities, such as persons, companies, geographic locations, in Web documents. The names of these extracted entities are, however, non-unique; the same name on different Web pages might or might not refer to the same entity. The entity matching problem concerns of identifying the entities, which are referring to the same real-world entity. This problem is very similar to the entity resolution problem studied in relational databases, however, there are also several differences. Most importantly Web pages often only contain partial or incomplete information about the entities. Similarity functions try to capture the degree of belief about the equivalence of two entities, thus they play a crucial role in entity matching. The accuracy of the similarity functions highly depends on the applied assessment techniques, but also on some specific features of the entities. We propose systematic design strategies for combined similarity functions in this context. Our method relies on the combination of multiple evidences, with the help of estimated quality of the individual similarity values and with particular attention to missing information that is common in Web context. We study the effectiveness of our method in two specific instances of the general entity matching problem, namely the person name disambiguation and the Twitter message classification problem. In both cases, using our techniques in a very simple algorithmic framework we obtained better results than the state-of-the-art methods. (C) 2011 Elsevier Ltd All rights reserved.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Review Multidisciplinary Sciences

(Almost) all of entity resolution

Olivier Binette, Rebecca C. Steorts

Summary: This article discusses the importance of integrating information from multiple sources, as well as the application of modern probabilistic and Bayesian methods in statistics, computer science, machine learning, database management, economics, and political science.

SCIENCE ADVANCES (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Loss functions for pose guided person image generation

Haoyue Shi, Le Wang, Nanning Zheng, Gang Hua, Wei Tang

Summary: This paper comprehensively studies the impact of different loss functions on pose guided person image generation, finding that a combination of adversarial loss, perceptual loss, and PSSIM loss yields optimal results.

PATTERN RECOGNITION (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Named Entity Location Prediction Combining Twitter and Web

Yinan Liu, Wei Shen, Zonghai Yao, Jianyong Wang, Zhenglu Yang, Xiaojie Yuan

Summary: Knowledge bases play a critical role in various applications, but they are often incomplete. Enriching knowledge bases with new entities and location attributes is becoming increasingly important. This study introduces NELPTW, an unsupervised framework for predicting named entity location by leveraging knowledge from Twitter and Web, which significantly outperforms baselines in accuracy.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2021)

添加到收藏夹

Article Automation & Control Systems

A modified GNN architecture with enhanced aggregator and Message Passing Functions

Debjit Sarkar, Sourodeep Roy, Samir Malakar, Ram Sarkar

Summary: Graph neural networks (GNN) maintain the essence of irregularly structured information in a graph through message passing and feature aggregation. A weighting scheme called VecGNN is proposed to incorporate inter-node feature-level correlational information, considering the relative position of nodes in the feature space. VecGNN outperforms baseline models GCN, GAT, and JKNets by 2%-4% on citation datasets.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2023)

添加到收藏夹

Article Computer Science, Information Systems

Generating Name-Like Vectors for Testing Large-Scale Entity Resolution

Samudra Herath, Matthew Roughan, Gary Glonek

Summary: Entity resolution is a key task in data integration, with accurate and efficient resolution having a significant impact across various fields. The lack of real training data and privacy concerns make simulation tools important for testing algorithms in entity resolution research.

IEEE ACCESS (2021)

添加到收藏夹

Proceedings Paper Computer Science, Interdisciplinary Applications

AuthCrowd: Author Name Disambiguation and Entity Matching using Crowdsourcing

Antonio Correia, Diogo Guimaraes, Dennis Paulino, Shoaib Jameel, Daniel Schneider, Benjamim Fonseca, Hugo Paredes

Summary: This paper presents an approach to handle name ambiguity problems through crowdsourcing as a complementary means to traditional unsupervised approaches, demonstrating its potential for improving author name disambiguation and highlighting the importance of adopting hybrid crowd-algorithm collaboration strategies.

PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD) (2021)

添加到收藏夹

Article Computer Science, Information Systems

Dark Web: E-Commerce Information Extraction Based on Name Entity Recognition Using Bidirectional-LSTM

Syed Afeef Ahmed Shah, Muhammad Ali Masood, Amanullah Yasin

Summary: Extracting information from e-commerce platforms is a challenging task due to the increasing number of marketplaces. Existing data mining techniques may not provide sufficient accuracy. In this study, we propose a Bi-directional LSTM with CNN model for detecting e-commerce entities, achieving high accuracy on dark web and Conll-2003 datasets.

IEEE ACCESS (2022)

添加到收藏夹

Article Computer Science, Information Systems

Toward the Multilingual Semantic Web: Multilingual Ontology Matching and Assessment

Shimaa Ibrahim, Said Fathalla, Jens Lehmann, Hajira Jabeen

Summary: This paper proposes a Multilingual Ontology Matching (MoMatch) approach for matching ontologies in different natural languages. It uses machine translation and various string similarity techniques to identify correspondences across different ontologies. The paper also presents a Quality Assessment Suite for Ontologies (QASO) that evaluates the quality of the matching process and the ontology. The results show that MoMatch outperforms five state-of-the-art matching approaches in terms of precision, recall, and F-measure.

IEEE ACCESS (2023)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Person identification from fingernails and knuckles images using deep learning features and the Bray-Curtis similarity measure

Mona Alghamdi, Plamen Angelov, Lopez Pellicer Alvaro

Summary: This paper presents an approach for person identification based on knuckle creases and fingernails. It introduces a framework that includes localization, recognition, segmentation, and similarity matching of hand components. The results show that knuckle patterns and fingernails play a significant role in person identification.

NEUROCOMPUTING (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

An Improved Structural-Based Ontology Matching Approach Using Similarity Spreading

Sengodan Mani, Samukutty Annadurai

Summary: A new modified model of similarity spreading for ontology mapping is proposed in this paper, which aims to address the heterogeneity issue between ontologies for interoperability. By utilizing node clustering based on edge affinity and coefficient similarity propagation, the model achieves graph matching. The evaluation shows that the proposed model outperforms similar systems.

INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS (2022)

添加到收藏夹

Article Computer Science, Information Systems

Entity Matching Based on Attribute-Aware and Multi-Perspective Similarity Measurement

Xin Xing, Ning Wang

Summary: Entity matching (EM) is the process of identifying tuples from different data sources that refer to the same real-world entity. Existing research focuses on attribute heterogeneity and selecting similarity measures for different types of attributes. However, they overlook matching information from various aspects and the impact of dirty data. In this paper, we propose an entity matching method that incorporates attribute-aware and multi-perspective similarity measurement. Experimental results demonstrate its superiority over state-of-the-art methods on multiple real-world datasets.

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING (2023)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Probing the Robustness of Pre-trained Language Models for Entity Matching

Mehdi Akbarian Rastaghi, Ehsan Kamalloo, Davood Rafiei

Summary: The study found that data imbalance in the training data is a key issue affecting model robustness, and data augmentation alone is not sufficient to ensure model robustness. Simple modifications can improve the robustness of PLM-based EM models.

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 (2022)

添加到收藏夹

Review Engineering, Electrical & Electronic

Recent Advancements in Semantic Web Service Selection

Riddhi Pahariya, Lalit Purohit

Summary: Conventional web services have a minor role in semantics, while the maximum matching process in semantic web service selection plays a crucial role in achieving accurate results. Most existing web service selection methods rely on keyword-based searching, disregarding semantic understanding and resulting in irrelevant outcomes. This paper reviews the latest research on semantic web service selection, discussing techniques applicable to both web service composition and selection, and presents the application of two network flow-based approaches to achieve improved web service selection.

IETE JOURNAL OF RESEARCH (2022)

添加到收藏夹

Article Computer Science, Hardware & Architecture

Similarity Regression Of Functions In Different Compiled Forms With Neural Attentions On Dual Control-Flow Graphs

Yun Zhang, Yuling Liu, Ge Cheng, Jie Wang

Summary: This paper presents a method that uses a neural network model to learn the semantic and structural features of functions on control-flow graphs, to detect the similarity between functions in different compiled forms. Experiments show that this method outperforms other models in detecting binary functions with large control-flow graphs.

COMPUTER JOURNAL (2023)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Research on Multi-feature Expert Disambiguation of Same Name fused Personal Experience

Shangmei Li, Yangsen Zhang, Xiang Chen, Han Chen, Gaijuan Huang

Summary: Peer review is a common method used in the evaluation of scientific research projects or academic papers. The ambiguity caused by experts with the same name is a common problem in the selection of peer experts. This paper proposes a multi-feature expert disambiguation method that incorporates personal experience, by constructing an expert disambiguation feature representation model, providing similarity measurement methods and a similarity threshold. This method can effectively solve the problem.

2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022) (2022)

添加到收藏夹

暂无数据

Article Computer Science, Information Systems

Measuring rule-based LTLf process specifications: A probabilistic data-driven approach

Alessio Cecconi, Luca Barbaro, Claudio Di Ciccio, Arik Senderovich

Summary: This paper introduces a framework for designing probabilistic measures for declarative process specifications, which can assess the degree of compliance between process data and specifications. Through experiments, the applicability of the approach for various process mining tasks is demonstrated.

INFORMATION SYSTEMS (2024)

添加到收藏夹

Article Computer Science, Information Systems

A Value Co-Creation Perspective on Data Labeling in Hybrid Intelligence Systems: A Design Study

Mahei Manhai Li, Philipp Reinhard, Christoph Peters, Sarah Oeste-Reiss, Jan Marco Leimeister

Summary: This article introduces a novel human-in-the-loop (HIL) design for ITSM support ticket recommendations by incorporating a value co-creation perspective. The design incentivizes ITSM agents to provide labels during their everyday ticket-handling procedures, and the evaluation shows that recommendations after label improvement have increased user ratings.

INFORMATION SYSTEMS (2024)

添加到收藏夹

Article Computer Science, Information Systems

A survey of approaches for event sequence analysis and visualization

Anton Yeshchenko, Jan Mendling

Summary: This paper presents the development of event sequence data analysis techniques in different fields and proposes an integrated framework to facilitate collaboration and research synergy across various domains.

INFORMATION SYSTEMS (2024)

添加到收藏夹

Article Computer Science, Information Systems

Adoption of IT solutions: A data-driven analysis approach

Iris Reinhartz-Berger, Alan Hartman, Doron Kliger

Summary: Many IT departments provide solutions that partially meet the needs of business units. This research aims to develop a data-driven analysis method to support the selection of solutions with higher prospects of adoption and identify design gaps and barriers.

INFORMATION SYSTEMS (2024)

添加到收藏夹

Article Computer Science, Information Systems

Discovery, simulation, and optimization of business processes with differentiated resources

Orlenys Lopez-Pintado, Marlon Dumas, Jonas Berx

Summary: Business process simulation is a versatile technique that predicts the impact of changes on process performance. However, previous approaches have limitations due to their treatment of resources as undifferentiated entities. This article addresses this issue by proposing a new simulation approach that treats each resource as an individual entity with its own performance and availability. The article also presents methods for discovering simulation models with differentiated resources and optimizing resource availability calendars. Empirical evaluation demonstrates that differentiated resource models better replicate cycle time distributions and work rhythm, and iterative optimization of resource allocations and calendars leads to improved cost-time tradeoffs.

INFORMATION SYSTEMS (2024)

添加到收藏夹

© Peeref 2019-2024. All rights reserved.