Description
Andreas Dewes - Analyzing Data with Python & Docker [EuroPython 2016] [21 July 2016] [Bilbao, Euskadi, Spain] (https://ep2016.europython.eu//conference/talks/analyzing-data-with-python-docker)
Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale!
I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows which can be used in many different contexts (possibly with a live demo).
Docker is a powerful tool for packaging software and services in containers and running them on a virtual infrastructure. Python is a very powerful language for data analysis. What happens if we combine the two? We get a very versatile and robust system for analyzing data at small and large scale!
I will show how we can make use of Python and Docker to build repeatable, robust data analysis workflows that can be used in many different contexts. I will explain the core ideas behind Docker and show how they can be useful in data analysis. I will then discuss an open-source Python library (Rouster) which uses the Python Docker-API to analyze data in containers and show several interesting use cases (possibly even a live-demo).
Outline:
- Why data analysis can be frustrating: Managing software, dependencies, data versions, workflows
- How Docker can help us to make data analysis easier & more reproducible
- Introducing Rouster: Building data analysis workflows with Python and Docker
- Examples of data analysis workflows: Business Intelligence, Scientific Data Analysis, Interactive Exploration of Data
- Future Directions & Outlook