Skip to main content

SoBigData Articles

Differential privacy

Differential privacy (DP) was originally proposed for interactive statistical queries to a database. With DP, the presence or absence of any single record must not be noticeable from the query answers, up to an exponential factor ε.

DP offers a very neat privacy guarantee and, unlike privacy models in the k-anonymity family, does not make assumptions on the intruder's background knowledge (although it assumes that all records in the database are independent). For this reason, DP was rapidly adopted by the research community to the point that previous approaches tend to be regarded as obsolete. 

Researchers and practitioners have extended the use of DP beyond the interactive setting it was designed for. Extended uses include: data release, where privacy of respondents versus data analysts is the goal, and collection of personal information, where privacy of respondents versus the data collector is claimed. Google, Apple and Facebook have seen the chance to collect or release microdata (individual respondent data) from their users under the privacy pledge “don't worry, whatever you tell us will be DP-protected”.

However, applying DP to record-level data release or collection (which is equivalent to answering identity queries) requires employing a large amount of noise. As a result, the analytical utility of DP outputs is likely to be very poor. This problem arose as soon as DP was moved outside the interactive setting. 

The paper from Domingo-Ferrer et al. [1] reviews the limitations of DP and its misuse for individual data collection, individual data release, and machine learning. Authors show that fundamental misunderstandings and blatantly flawed implementations pervade the application of DP to non-interactive settings. These misconceptions have serious consequences in terms of poor privacy or poor utility and they are driven by the insistence to twist DP in ways that contradict its own core idea: to make the data of any single individual unnoticeable. 

In conclusion, DP is neither a silver bullet for all privacy problems nor a replacement for all previous privacy models. In fact, extreme care should be exercised when trying to extend its use beyond the setting it was designed for. 

Reference:

[1] Josep Domingo-Ferrer, David Sánchez and Alberto Blanco-Justicia, “The limits of differential privacy (and its misuse in data release and machine learning)”, Communications of the ACM, to appear. Preprint in http://arxiv.org/abs/2011.02352

Written by: David Sánchez

Revised by: Marco Braghieri, Francesca Pratesi