Description
Speakers: Sarah Guido, Sean O'Connor
Large-scale data analysis is complicated. There’s a limit to how much data you can analyze on a single box, but it is relatively inexpensive to get access to a large number of commodity servers. In this tutorial, you’ll learn how to leverage the power of distributed computing tools to do large-scale data analysis quickly and affordably using pure Python, Hadoop MapReduce, and Apache Spark.
Slides can be found at: https://speakerdeck.com/pycon2016 and https://github.com/PyCon/2016-slides