Contribute Media
A thank you to everyone who has made this possible: Read More

Analyzing Data with Python & Docker

Description

Andreas Dewes - Analyzing Data with Python & Docker [EuroPython 2016] [21 July 2016] [Bilbao, Euskadi, Spain] (https://ep2016.europython.eu//conference/talks/analyzing-data-with-python-docker)

Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale!

I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows which can be used in many different contexts (possibly with a live demo).


Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale!

I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows that can be used in many different contexts. I will explain the core ideas behind Docker and show how they can be useful in data analysis. I will then discuss an open-source Python library (Rouster) which uses the Python Docker-API to analyze data in containers and show several interesting use cases (possibly even a live-demo).

Outline:

  1. Why data analysis can be frustrating: Managing software, dependencies, data versions, workflows
  2. How Docker can help us to make data analysis easier & more reproducible
  3. Introducing Rouster: Building data analysis workflows with Python and Docker
  4. Examples of data analysis workflows: Business Intelligence, Scientific Data Analysis, Interactive Exploration of Data
  5. Future Directions & Outlook
Improve this page