Take some time to transcribe PyCon 2014 talks! Click on the "Share" button below the video and then "Subtitle" to get started.
- Number of videos:
Derek Eder of Webitects and Forest Gregg, a Ph.D. student of sociology at the University of Chicago, will describe the Python library they are developing to deduplicate tabular data, quickly, accurately, and at a large scale. The library facilitates the matching of related records in different data sets, using a machine learning approach. They expect to have a demo to show and will explain how they expect that the library will be used.