Derek Eder

Number of videos:
Big Data De-duping
Derek Eder , Forest Gregg
Recorded: June 14, 2012Language: English

Derek Eder of Webitects and Forest Gregg, a Ph.D. student of sociology at the University of Chicago, will describe the Python library they are developing to deduplicate tabular data, quickly, accurately, and at a large scale. The library facilitates the matching of related records in different data sets, using a machine learning approach. They expect to have a demo to show and will explain how they expect that the library will be used.