Description
Dask is a flexible library for parallel computing in Python. Dask provides high-level interfaces to extend the PyData ecosystem to larger-than-memory or distributed environments, as well as lower-level interfaces to customize workflows. This tutorial will be a data-oriented, hands-on, workshop that will show new users how to scale NumPy and pandas via the Dask Array and Dask DataFrame collections, and how to use the interactive diagnostic tools to understand their computational performance. It will also cover the low-level Dask Delayed and Futures interfaces, and introduce cluster deployments. No previous Dask experience is required, though knowledge of Python basics and familiarity with NumPy and pandas is preferred.