☆ 4.7 Article

k-Anonymization with Minimal Loss of Information

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2009)

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

Volume 21, Issue 2, Pages 206-219

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TKDE.2008.129

Keywords

Privacy-preserving data mining; k-anonymization; approximation algorithms for NP-hard problems

Categories

Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The technique of k-anonymization allows the releasing of databases that contain personal information while ensuring some degree of individual privacy. Anonymization is usually performed by generalizing database entries. We formally study the concept of generalization, and propose three information-theoretic measures for capturing the amount of information that is lost during the anonymization process. The proposed measures are more general and more accurate than those that were proposed by Meyerson and Williams [23] and Aggarwal et al. [1]. We study the problem of achieving k-anonymity with minimal loss of information. We prove that it is NP-hard and study polynomial approximations for the optimal solution. Our first algorithm gives an approximation guarantee of O(ln k) for two of our measures as well as for the previously studied measures. This improves the best-known O(k)-approximation in [1]. While the previous approximation algorithms relied on the graph representation framework, our algorithm relies on a novel hypergraph representation that enables the improvement in the approximation ratio from O(k) to O(ln k). As the running time of the algorithm is O(n(2k)), we also show how to adapt the algorithm in [1] in order to obtain an O(k)-approximation algorithm that is polynomial in both n and k.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Computer Science, Information Systems

Privacy-preserving process mining: A microaggregation-based approach

Edgar Batista, Antoni Martinez-Balleste, Agusti Solanas

Summary: The proper utilization of process mining techniques with large amounts of event data can lead to the discovery, monitoring, and improvement of business processes, enabling the development of more efficient business intelligence systems. However, privacy concerns arising from personal and confidential information within event data have not been adequately addressed in the field of process mining. This article presents a novel privacy-preserving process mining method called k-PPPM, which utilizes microaggregation techniques to achieve k-anonymity and protects targeted individuals from re-identification through attacks based on process model analysis and location-oriented attacks.

JOURNAL OF INFORMATION SECURITY AND APPLICATIONS (2022)

Add to Collection

Article Computer Science, Information Systems

Decentralized k-anonymization of trajectories via privacy-preserving tit-for-tat

Josep Domingo-Ferrer, Sergio Martinez, David Sanchez

Summary: This paper discusses the importance of mobility data and proposes a decentralized approach to anonymize trajectories while protecting privacy. By aggregating with similar trajectories, a k-anonymized mobility dataset is constructed.

COMPUTER COMMUNICATIONS (2022)

Add to Collection

Article Public, Environmental & Occupational Health

An anonymization-based privacy-preserving data collection protocol for digital health data

J. Andrew, R. Jennifer Eunice, J. Karthikeyan

Summary: Digital health data collection is important but challenging due to privacy concerns. Existing research studies have limitations such as involving third-party anonymizers or private channels. This article proposes a novel approach that anonymizes healthcare data without third-party involvement and restricts communication to elected representatives. The proposed protocol overcomes privacy attacks and outperforms state-of-the-art techniques in privacy protection and computational complexity.

FRONTIERS IN PUBLIC HEALTH (2023)

Add to Collection

Article Computer Science, Information Systems

PPTPF: Privacy-Preserving Trajectory Publication Framework for CDR Mobile Trajectories

Jianxi Yang, Manoranjan Dash, Sin G. Teo

Summary: With the rapid advancement of mobile phone technology, location-based services rely heavily on user mobility data which poses privacy concerns. Effective privacy preservation algorithms for trajectory data are essential to balance utility and privacy for mobile users. The proposed Privacy-Preserving Trajectory Publication Framework for CDR offers a novel approach for anonymizing trajectory data, catering to user privacy and service efficiency.

ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION (2021)

Add to Collection

Article Computer Science, Information Systems

Bridging unlinkability and data utility: Privacy preserving data publication schemes for healthcare informatics

Kah Meng Chong, Amizah Malip

Summary: Publishing patient data without revealing sensitive information is a challenging research issue in the healthcare sector. This paper introduces two new privacy notions, namely identity unlinkability and attribute unlinkability, and designs schemes to address identity and attribute disclosure problems while preserving data utility. Experimental results demonstrate the effectiveness of our schemes in achieving both data utility preservation and privacy protection simultaneously.

COMPUTER COMMUNICATIONS (2022)

Add to Collection

Article Computer Science, Information Systems

KAB: A new k-anonymity approach based on black hole algorithm

Lynda Kacha, Abdelhafid Zitouni, Mahieddine Djoudi

Summary: K-anonymity is a widely used approach for privacy preservation in microdata, but it suffers from information loss. To address this issue, this paper proposes a novel algorithm based on the Black Hole Algorithm, which improves data utility.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

Add to Collection

Article Computer Science, Information Systems

Multi-Scale, Class-Generic, Privacy-Preserving Video

Zhixiang Zhang, Thomas Cilloni, Charles Walter, Charles Fleming

Summary: The paper introduces a novel privacy-preserving video algorithm that utilizes semantic segmentation and adaptive blurring to identify and anonymize objects of different scales, while maintaining the meaning in the visual data.

ELECTRONICS (2021)

Add to Collection

Article Multidisciplinary Sciences

Privacy preserving dynamic data release against synonymous linkage based on microaggregation

Yan Yan, Anselme Herman Eyeleko, Adnan Mahmood, Jing Li, Zhuoyue Dong, Fei Xu

Summary: The rapid development of the mobile Internet and widespread use of intelligent terminals have accelerated the digitization of personal information and the evolution of the big data era. Sharing and publishing big data bring convenience but also increase the risk of personal privacy leakage. To reduce privacy leakage caused by data release, various privacy preserving data publishing methods have been proposed. However, non-numerical sensitive information may still have semantic relevance, leading to serious privacy disclosures. This paper introduces a privacy preserving dynamic data publishing method based on microaggregation to address this issue, which shows better privacy protection and availability of published data compared to existing methods.

SCIENTIFIC REPORTS (2022)

Add to Collection

Article Engineering, Multidisciplinary

Privacy preserving and data publication for vehicular trajectories with differential privacy

Muhammad Arif, Jianer Chen, Guojun Wang, Oana Geman, Valentina Emilia Balas

Summary: In Vehicular Ad-hoc Networks, Location-based Services provide personalized services to clients based on their movement characteristics, but privacy protection is a challenge. Proposed Differential Privacy and generalization based anonymization approach aims to protect sensitive vehicular trajectories. Experiments show good data feasibility and efficiency of the method, as well as the impact of privacy budget values on error rates.

MEASUREMENT (2021)

Add to Collection

Article Computer Science, Information Systems

Efficient and Privacy Preserving Approximation of Distributed Statistical Queries

Philip Derbeko, Shlomi Dolev, Ehud Gudes, Jeffrey D. Ullman

Summary: In recent years, the increasing amount of data collected from different and often non-cooperative databases has posed challenges for privacy-preserving distributed calculations. This paper proposes a sampling method to improve computational performance and discusses an analysis of error bounds. Experimental results confirm the validity of the approach.

IEEE TRANSACTIONS ON BIG DATA (2022)

Add to Collection

Article Computer Science, Information Systems

Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing

J. Andrew Onesimu, J. Karthikeyan, Jennifer Eunice, Marc Pomplun, Hien Dang

Summary: Advancements in Industry 4.0 have brought significant improvements to the healthcare sector. However, sharing healthcare data while preserving privacy is challenging due to security concerns. This paper presents an attribute-focused privacy preserving data publishing scheme that combines fixed-interval and improved l-diverse slicing approaches. Experimental results show improved accuracy and reduced information loss compared to existing methods. The proposed scheme provides data utility while protecting against various privacy breaches.

IEEE ACCESS (2022)

Add to Collection

Article Computer Science, Information Systems

Semantics-aware mechanisms for control-flow anonymization in process mining

Stephan A. Fahrenkrog-Petersen, Martin Kabierski, Han van der Aa, Matthias Weidlich

Summary: Information systems support business process execution and data about process execution is recorded in event logs for analysis. To protect personal information, anonymization techniques should be used. This paper presents two approaches, SaCoFa and SaPa, for anonymizing the control-flow of a process.

INFORMATION SYSTEMS (2023)

Add to Collection

Article Computer Science, Information Systems

Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing

Brijesh B. Mehta, Udai Pratap Rao

Summary: This paper discusses the challenges of privacy preservation in big data analytics and proposes an improved method ImSLD based on scalable k-anonymization. By testing on poker dataset within the MapReduce framework, significant improvements in running time and lower information loss were demonstrated compared to existing methods.

JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES (2022)

Add to Collection

Article Computer Science, Information Systems

Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey

Abdul Majeed, Sungchang Lee

Summary: Anonymization is a practical solution for protecting user privacy, with many data owners anonymizing data to safeguard user privacy. This paper systematically investigates relational and structural anonymization techniques, categorizes and evaluates existing anonymization methods, and discusses the challenges and research directions in privacy preserving data publishing involving social network and relational data.

IEEE ACCESS (2021)

Add to Collection

Article Computer Science, Information Systems

An efficient clustering-based anonymization scheme for privacy-preserving data collection in IoT based healthcare services

J. Andrew Onesimu, J. Karthikeyan, Yuichi Sei

Summary: The healthcare services industry has undergone significant changes with the rise of IoT, leading to concerns about privacy of patient data. By utilizing a clustering-based anonymity model, an efficient privacy-preserving scheme has been proposed to address privacy concerns and prevent various attacks in healthcare IoT systems.

PEER-TO-PEER NETWORKING AND APPLICATIONS (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

The network-untangling problem: from interactions to activity timelines

Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

Summary: This paper investigates the problem of determining entity activity based on interactions, proposing two formulations and efficient algorithms for untangling networks. While the sum problem is shown to be NP-hard, the max problem can be solved optimally in linear time. In cases of multiple activity intervals per entity, both formulations are proved to be inapproximable but efficient algorithms based on alternative optimization are proposed. Evaluation on synthetic and real-world datasets supports the validity of concepts and performance of algorithms.

DATA MINING AND KNOWLEDGE DISCOVERY (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

Strengthening ties towards a highly-connected world

Antonis Matakos, Aristides Gionis

Summary: Online social networks offer numerous benefits such as establishing new connections, gaining knowledge about the world, exposure to diverse viewpoints, and access to previously inaccessible information. This research focuses on leveraging the triadic closure principle to develop methods that foster new connections and improve the flow of information in the network.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Provable randomized rounding for minimum-similarity diversification

Bruno Ordozgoiti, Ananth Mahadevan, Antonis Matakos, Aristides Gionis

Summary: When searching for information in a data collection, it is often important to not only find relevant items but also assemble a diverse set to explore different concepts in the data. This paper addresses the problem of finding a diverse set of items when item relatedness is measured by a similarity function. The authors propose a new minimization objective and employ a randomized rounding strategy to find good solutions efficiently. They also introduce a novel bound for the ratio of Poisson-Binomial densities, which has applications beyond this problem. The proposed algorithm outperforms greedy approaches commonly used in the literature according to experiments on benchmark datasets.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Ranking with submodular functions on a budget

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Summary: Submodular maximization is fundamental in many important machine learning problems and has various applications. However, the study of maximizing submodular functions has often been limited to selecting a set of items, while many real-world applications require a ranking solution. This paper introduces a novel formulation for ranking items with submodular valuations and budget constraints, and proposes practical algorithms with approximation guarantees for different types of budget constraints. The empirical evaluation shows that the proposed algorithms outperform strong baselines.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Add to Collection

Article Computer Science, Artificial Intelligence

Maximizing the Diversity of Exposure in a Social Network

Antonis Matakos, Cigdem Aslay, Esther Galbrun, Aristides Gionis

Summary: Social-media platforms have provided new ways for citizens to participate in public debates and stay informed. This paper proposes a novel approach to maximize the diversity of exposure in a social network, ensuring citizens are exposed to diverse viewpoints for a healthy information sharing environment.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

Add to Collection

Proceedings Paper Computer Science, Information Systems

SIEVE: A Space-Efficient Algorithm for Viterbi Decoding

Martino Ciaperoni, Aristides Gionis, Athanasios Katsamanis, Panagiotis Karras

Summary: This paper presents an algorithm called SIEVE, which is an improvement on the Viterbi algorithm to address the issue of its space complexity growing with the number of observations. SIEVE improves space efficiency by discarding and recomputing parts of the DP solution, without incurring a time complexity overhead.

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) (2022)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Diversity-Aware k-median: Clustering with Fair Center Representation

Suhas Thejaswi, Bruno Ordozgoiti, Aristides Gionis

Summary: The study introduces a novel problem of diversity-aware clustering, where potential cluster centers belong to groups defined by protected attributes. It shows that the diversity-aware k-median problem is NP-hard in general cases but approximation algorithms can be obtained when facility groups are disjoint. Experimentally, approximation methods are evaluated for tractable cases, and a relaxation-based heuristic is provided for theoretically intractable scenarios.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II (2021)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks

Cigdem Aslay, Martino Ciaperoni, Aristides Gionis, Michael Mathioudakis

Summary: Bayesian networks are probabilistic models capturing dependencies among variables, with Variable Elimination being a fundamental algorithm for probabilistic inference. This paper proposes a novel materialization method to enhance efficiency in processing inference queries. Experimental results show that moderate materialization can significantly improve query running time.

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021) (2021)

Add to Collection

Article Computer Science, Artificial Intelligence

(So) Big Data and the transformation of the city

Gennady Andrienko, Natalia Andrienko, Chiara Boldrini, Guido Caldarelli, Paolo Cintia, Stefano Cresci, Angelo Facchini, Fosca Giannotti, Aristides Gionis, Riccardo Guidotti, Michael Mathioudakis, Cristina Ioana Muntean, Luca Pappalardo, Dino Pedreschi, Evangelos Pournaras, Francesca Pratesi, Maurizio Tesconi, Roberto Trasarti

Summary: The exponential growth of large-scale mobility data has led to the vision of smart cities but also raised privacy concerns. Research communities and industrial stakeholders show strong interest in building knowledge discovery pipelines over these data sources.

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS (2021)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Mining Signed Networks: Theory and Applications Tutorial proposal for the Web Conference 2020

Aristides Gionis, Antonis Matakos, Bruno Ordozgoiti, Han Xiao

WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020 (2020)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Improved mixing time for k-subgraph sampling

Ryuta Matsuno, Aristides Gionis

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Pattern detection in large temporal graphs using algebraic fingerprints

Suhas Thejaswi, Aristides Gionis

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Add to Collection

Proceedings Paper Computer Science, Artificial Intelligence

Maximizing diversity over clustered data

Guangyi Zhang, Aristides Gionis

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Searching for polarization in signed graphs: a local spectral approach

Han Xiao, Bruno Ordozgoiti, Aristides Gionis

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

Add to Collection

Proceedings Paper Computer Science, Information Systems

Finding large balanced subgraphs in signed networks

Bruno Ordozgoiti, Antonis Matakos, Aristides Gionis

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.