Privacy in data publishing: a unified view ------------------------------------------- Avinash Vyas (UCSD) In data publishing scenarios where the data contains some personal information about individuals, the data owners are confounded with the problem of keeping the association between identity of an individual and his/her sensitive value, a secret. For example health insurance companies would like to publish patient's data to aid statistical analyses but would like to prevent an adversary to learn medical condition of particular individuals. In contrast to conventional data security and access control that tries to prevent information disclosure against illegitimate means (such as hacking, access control violations, query-injection, theft etc.), privacy in this context means to prevent information disclosure due to legitimate access to the data. For instance, in a well known case, a health insurance company removed all the identifying attributes (like ssn, name) before publishing their patient data. Still, a researcher was able to combine/join the published data with voter registration list to reveal the medical condition of the governor of the state. Various proposals such as k-anonymity, l-diversity, t-closeness, m-confidentiality have tried to define a notion of "privacy" for published data and how it prevents disclosures such as one described above. Each technique has seemingly different underlying assumptions and provides privacy against different types of attack. This presentation describes a single model of attack and what it means to have privacy against such an attack. Importantly, it provides a unified way of describing all the previous notions of privacy in the literature as instantiations of this model.