Contribute Media
A thank you to everyone who has made this possible: Read More

What every data scientist should know about data anonymization

Description

PyData Berlin 2016

There are numerous examples of data anonymization gone horribly wrong - the most prominent one might be the netflix prize, where researchers were able to uniquely identify users by combining netflix user data with imdb reviews. Let's learn from their mistakes and look at some of the measures you can take to better anonymize data before you share it with others.

Outline:

  • Look at some of the examples where data anonymization was broken and identify what went wrong
  • What is the state of the art for data anonymization and can you be sure to be safe if you follow it?
  • How does anonymization affect the possibilities for data mining/machine learning on the data?

This talk is aimed at people who want release open data or want to share sensitive data between departments.

Slides: https://github.com/krasch/presentations/blob/master/pydata_Berlin_2016.pdf

Improve this page