Contribute Media
A thank you to everyone who has made this possible: Read More

Dask for ad hoc distributed computing


PyData DC 2016

This talk discusses parallel and distributed computing in Python, particularly for ad-hoc and custom algorithms. It focuses on Dask, a Python solution for flexible distributed computing.

The Python data science stack contains efficient algorithms with intuitive interfaces for sophisticated and friendly analysis. As the data science community tackles larger problems with larger hardware we naturally ask how best to parallelize this software stack both across many cores in a single computer and across computers in a cluster. This turns out to be harder than it looks, even with traditional Big Data tools like MapReduce, Storm, and Spark. Both the complexity of the algorithms and the high expectations for interactivity raise challenges for these systems. This talk lays out the benefits and challenges of parallelizing a numeric analytic stack, and then describes Dask, a parallel framework gaining traction within the Python community for interactive performant parallel computing, and finally goes through a few domains where this work is enabling novel science today.


Improve this page