Contribute Media
A thank you to everyone who makes this possible: Read More

A Tour of Large-Scale Data Analysis Tools in Python

Description

Speakers: Sarah Guido, Sean O'Connor

Large-scale data analysis is complicated. There’s a limit to how much data you can analyze on a single box, but it is relatively inexpensive to get access to a large number of commodity servers. In this tutorial, you’ll learn how to leverage the power of distributed computing tools to do large-scale data analysis quickly and affordably using pure Python, Hadoop MapReduce, and Apache Spark.

Slides can be found at: https://speakerdeck.com/pycon2016 and https://github.com/PyCon/2016-slides

Improve this page