4.6 Article

Differentially Private Publication of Vertically Partitioned Data

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TDSC.2019.2905237

Keywords

Publishing; Differential privacy; Distributed databases; Protocols; Privacy; Correlation; Differential privacy; data publishing; vertical partitioning; latent tree model

Funding

  1. National Natural Science Foundation of China [61872045]
  2. BUPT Excellent Ph.D.
  3. Students Foundation [CX2016301]

Ask authors/readers for more resources

This paper focuses on the issue of publishing vertically partitioned data under differential privacy, proposing a differentially private latent tree (DPLT) approach that can generate synthetic datasets while protecting data privacy. Through extensive experiments, it is demonstrated that this method offers desirable data utility with low computation costs.
In this paper, we study the problem of publishing vertically partitioned data under differential privacy, where different attributes of the same set of individuals are held by multiple parties. In this setting, with the assistance of a semi-trusted curator, the involved parties aim to collectively generate an integrated dataset while satisfying differential privacy for each local dataset. Based on the latent tree model (LTM), we present a differentially private latent tree (DPLT) approach, which is, to the best of our knowledge, the first approach to solving this challenging problem. In DPLT, the parties and the curator collaboratively identify the latent tree that best approximates the joint distribution of the integrated dataset, from which a synthetic dataset can be generated. The fundamental advantage of adopting LTM is that we can use the connections between a small number of latent attributes derived from each local dataset to capture the cross-dataset dependencies of the observed attributes in all local datasets such that the joint distribution of the integrated dataset can be learned with little injected noise and low computation and communication costs. DPLT is backed up by a series of novel techniques, including two-phase latent attribute generation (TLAG), tree index based correlation quantification (TICQ) and distributed Laplace perturbation protocol (DLPP). Extensive experiments on real datasets demonstrate that DPLT offers desirable data utility with low computation and communication costs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available