Contribute Media
A thank you to everyone who makes this possible: Read More

Building Your First Data Pipelines

Description

PyData DC 2016

You need a data pipeline. This talk will discuss the lifecycle of projects using Jupyter notebooks & Luigi as a data pipeline management tool for a variety of projects, from greenfield to retrofitting complex systems. It will included a hands on demo.

Data pipelines are hard. Too often we resort to retrofitting janky scripts, relying on keeping a readme up to data, etc.

First, this proposal lays out the variety of tools that are available to build data pipelines. This talk will discuss why you should be using Luigi and how to use it in a variety of common use cases.

Next, we will build a basic exploratory analysis using DC open data and Luigi to demonstrate the power of this concept and how it works with Jupyter.

Finally, we'll retrofit a larger, more complex project to use Luigi to show how you can use it in bigger organizations.

Details

Improve this page