Contribute Media
A thank you to everyone who makes this possible: Read More

Building Data Pipelines in Python

Description

PyData London 2016

This talk discusses the process of building data pipelines, e.g. extraction, cleaning, integration, pre-processing of data, in general all the steps that are necessary to prepare your data for your data-driven product. In particular, the focus is on data plumbing and on the practice of going from prototype to production.

This talk discusses the process of building data pipelines, e.g. extraction, cleaning, integration, pre-processing of data, in general all the steps that are necessary to prepare your data for your data-driven product. In particular, the focus is on data plumbing and on the practice of going from prototype to production.

Starting from some common anti-patterns, we'll highlight the need for a workflow manager for any non-trivial project.

We'll discuss the case for Luigi as an interesting option to consider, and we'll consider where it fits in the bigger picture of deploying a data product.

Slides available here: https://speakerdeck.com/marcobonzanini/building-data-pipelines-in-python-pydata-london-2016

Improve this page