Help us!

Take some time to transcribe PyCon 2014 talks! Click on the "Share" button below the video and then "Subtitle" to get started.

Derek Eder

Number of videos:
Big Data De-duping
Derek Eder , Forest Gregg
Recorded: June 14, 2012Language: English

Derek Eder of Webitects and Forest Gregg, a Ph.D. student of sociology at the University of Chicago, will describe the Python library they are developing to deduplicate tabular data, quickly, accurately, and at a large scale. The library facilitates the matching of related records in different data sets, using a machine learning approach. They expect to have a demo to show and will explain how they expect that the library will be used.