4.7 Article

k-Anonymization with Minimal Loss of Information

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2008.129

Keywords

Privacy-preserving data mining; k-anonymization; approximation algorithms for NP-hard problems

Ask authors/readers for more resources

The technique of k-anonymization allows the releasing of databases that contain personal information while ensuring some degree of individual privacy. Anonymization is usually performed by generalizing database entries. We formally study the concept of generalization, and propose three information-theoretic measures for capturing the amount of information that is lost during the anonymization process. The proposed measures are more general and more accurate than those that were proposed by Meyerson and Williams [23] and Aggarwal et al. [1]. We study the problem of achieving k-anonymity with minimal loss of information. We prove that it is NP-hard and study polynomial approximations for the optimal solution. Our first algorithm gives an approximation guarantee of O(ln k) for two of our measures as well as for the previously studied measures. This improves the best-known O(k)-approximation in [1]. While the previous approximation algorithms relied on the graph representation framework, our algorithm relies on a novel hypergraph representation that enables the improvement in the approximation ratio from O(k) to O(ln k). As the running time of the algorithm is O(n(2k)), we also show how to adapt the algorithm in [1] in order to obtain an O(k)-approximation algorithm that is polynomial in both n and k.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

The network-untangling problem: from interactions to activity timelines

Polina Rozenshtein, Nikolaj Tatti, Aristides Gionis

Summary: This paper investigates the problem of determining entity activity based on interactions, proposing two formulations and efficient algorithms for untangling networks. While the sum problem is shown to be NP-hard, the max problem can be solved optimally in linear time. In cases of multiple activity intervals per entity, both formulations are proved to be inapproximable but efficient algorithms based on alternative optimization are proposed. Evaluation on synthetic and real-world datasets supports the validity of concepts and performance of algorithms.

DATA MINING AND KNOWLEDGE DISCOVERY (2021)

Article Computer Science, Artificial Intelligence

Strengthening ties towards a highly-connected world

Antonis Matakos, Aristides Gionis

Summary: Online social networks offer numerous benefits such as establishing new connections, gaining knowledge about the world, exposure to diverse viewpoints, and access to previously inaccessible information. This research focuses on leveraging the triadic closure principle to develop methods that foster new connections and improve the flow of information in the network.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Article Computer Science, Artificial Intelligence

Provable randomized rounding for minimum-similarity diversification

Bruno Ordozgoiti, Ananth Mahadevan, Antonis Matakos, Aristides Gionis

Summary: When searching for information in a data collection, it is often important to not only find relevant items but also assemble a diverse set to explore different concepts in the data. This paper addresses the problem of finding a diverse set of items when item relatedness is measured by a similarity function. The authors propose a new minimization objective and employ a randomized rounding strategy to find good solutions efficiently. They also introduce a novel bound for the ratio of Poisson-Binomial densities, which has applications beyond this problem. The proposed algorithm outperforms greedy approaches commonly used in the literature according to experiments on benchmark datasets.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Article Computer Science, Artificial Intelligence

Ranking with submodular functions on a budget

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Summary: Submodular maximization is fundamental in many important machine learning problems and has various applications. However, the study of maximizing submodular functions has often been limited to selecting a set of items, while many real-world applications require a ranking solution. This paper introduces a novel formulation for ranking items with submodular valuations and budget constraints, and proposes practical algorithms with approximation guarantees for different types of budget constraints. The empirical evaluation shows that the proposed algorithms outperform strong baselines.

DATA MINING AND KNOWLEDGE DISCOVERY (2022)

Article Computer Science, Artificial Intelligence

Maximizing the Diversity of Exposure in a Social Network

Antonis Matakos, Cigdem Aslay, Esther Galbrun, Aristides Gionis

Summary: Social-media platforms have provided new ways for citizens to participate in public debates and stay informed. This paper proposes a novel approach to maximize the diversity of exposure in a social network, ensuring citizens are exposed to diverse viewpoints for a healthy information sharing environment.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (2022)

Proceedings Paper Computer Science, Information Systems

SIEVE: A Space-Efficient Algorithm for Viterbi Decoding

Martino Ciaperoni, Aristides Gionis, Athanasios Katsamanis, Panagiotis Karras

Summary: This paper presents an algorithm called SIEVE, which is an improvement on the Viterbi algorithm to address the issue of its space complexity growing with the number of observations. SIEVE improves space efficiency by discarding and recomputing parts of the DP solution, without incurring a time complexity overhead.

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Diversity-Aware k-median: Clustering with Fair Center Representation

Suhas Thejaswi, Bruno Ordozgoiti, Aristides Gionis

Summary: The study introduces a novel problem of diversity-aware clustering, where potential cluster centers belong to groups defined by protected attributes. It shows that the diversity-aware k-median problem is NP-hard in general cases but approximation algorithms can be obtained when facility groups are disjoint. Experimentally, approximation methods are evaluated for tractable cases, and a relaxation-based heuristic is provided for theoretically intractable scenarios.

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II (2021)

Proceedings Paper Computer Science, Information Systems

Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks

Cigdem Aslay, Martino Ciaperoni, Aristides Gionis, Michael Mathioudakis

Summary: Bayesian networks are probabilistic models capturing dependencies among variables, with Variable Elimination being a fundamental algorithm for probabilistic inference. This paper proposes a novel materialization method to enhance efficiency in processing inference queries. Experimental results show that moderate materialization can significantly improve query running time.

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021) (2021)

Article Computer Science, Artificial Intelligence

(So) Big Data and the transformation of the city

Gennady Andrienko, Natalia Andrienko, Chiara Boldrini, Guido Caldarelli, Paolo Cintia, Stefano Cresci, Angelo Facchini, Fosca Giannotti, Aristides Gionis, Riccardo Guidotti, Michael Mathioudakis, Cristina Ioana Muntean, Luca Pappalardo, Dino Pedreschi, Evangelos Pournaras, Francesca Pratesi, Maurizio Tesconi, Roberto Trasarti

Summary: The exponential growth of large-scale mobility data has led to the vision of smart cities but also raised privacy concerns. Research communities and industrial stakeholders show strong interest in building knowledge discovery pipelines over these data sources.

INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS (2021)

Proceedings Paper Computer Science, Information Systems

Mining Signed Networks: Theory and Applications Tutorial proposal for the Web Conference 2020

Aristides Gionis, Antonis Matakos, Bruno Ordozgoiti, Han Xiao

WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020 (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Improved mixing time for k-subgraph sampling

Ryuta Matsuno, Aristides Gionis

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Pattern detection in large temporal graphs using algebraic fingerprints

Suhas Thejaswi, Aristides Gionis

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Maximizing diversity over clustered data

Guangyi Zhang, Aristides Gionis

PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM) (2020)

Proceedings Paper Computer Science, Information Systems

Searching for polarization in signed graphs: a local spectral approach

Han Xiao, Bruno Ordozgoiti, Aristides Gionis

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

Proceedings Paper Computer Science, Information Systems

Finding large balanced subgraphs in signed networks

Bruno Ordozgoiti, Antonis Matakos, Aristides Gionis

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) (2020)

No Data Available