Differential privacy (DP) aims to design methods and algorithms that satisfy rigorous notions of privacy while simultaneously providing utility with valid statistical inference. More recently, an emphasis has been placed on combining notions of statistical utility with algorithmic approaches to address privacy risk in the presence of big data—with differential privacy emerging as a rigorous notion of risk. While DP provides strong guarantees for privacy, there are often tradeoffs regarding data utility and computational scalability. In this talk, we introduce a categorical data synthesizer that releases high-dimensional sparse histograms. Specifically, we combine a differential privacy algorithm—the stability based algorithm— along with feature hashing, with allows for dimension reduction in terms of the histograms and Gibbs sampling. As a result, our proposed algorithm is differentially private, offers similar or better statistical utility and is scalable to large databases. In addition, we given an analytical result for the error caused by the stability based algorithm, which allows us to control the loss of utility. Finally, we study the behavior of our algorithm on both simulated and real data.
WEB/CONFERENCE CALL INFO:
We are going to use the NC State WebEx for the web conference. Please note that this
WebEx belongs to NC State and can not be downloaded directly from Cisco.
Also, it should work on iPhones and iPads via the WebEx App.
A good internet connection is recommended.
For better audio, please join via computer and then have the meeting call your number
or call in directly using one of the numbers below. When not speaking, please mute
your phone to avoid background noise.
Meeting Number: 991 680 272
Audio Connection: 919-513-9329 (WolfMeeting)
Access Code: 991 680 272
Sponsored by the Laboratory for Analytic Sciences