Dask for task scheduling

YouTube

Description

Dask is a library for parallel and distributed computing for Python, commonly known for parallelizing libraries like NumPy and pandas. This talk discusses using Dask for task scheduling workloads, such as might be handled by Celery and Airflow, in a scalable and accessible manner.

Most previous talks on Dask focus on "big data" collections like distributed pandas dataframes. In this talk we'll diverge a bit and talk about more real-time and fine-grained settings. We'll discuss dask's concurrent.futures interface, integration with await/async syntax, dynamic workload handling, and more. This will focus more on the web-backend crowd than on the data-science crowd.

PyVideo

Dask for task scheduling

Description

Details