Contribute Media
A thank you to everyone who makes this possible: Read More

A Tour of Large-Scale Data Analysis Tools in Python


Speakers: Sarah Guido, Sean O'Connor

Large-scale data analysis is complicated. There’s a limit to how much data you can analyze on a single box, but it is relatively inexpensive to get access to a large number of commodity servers. In this tutorial, you’ll learn how to leverage the power of distributed computing tools to do large-scale data analysis quickly and affordably using pure Python, Hadoop MapReduce, and Apache Spark.

Slides can be found at: and

Improve this page