Contribute Media
A thank you to everyone who makes this possible: Read More

Docker and Python: making them play nicely and securely for Data Science and ML

Translations: en


Docker has become a standard tool for developers around the world to deploy applications in a reproducible and robust manner. The existence of Docker and Docker compose have reduced the time needed to set up new software and implementing complex technology stacks for our applications. Now, six years after the initial release of Docker, we can say with confidence that containers and containers orchestration have become some of the defaults in the current technology stacks.

There are thousands of tutorials and getting started documents for those wanting to adopt Docker for apps deployment. However, if you are a Data Scientist, a researcher or someone working on scientific computing wanting to adopt Docker, the story is quite different. There are very few tutorials (in comparison to app/web) and documents focused on Docker best practices for DS and scientific computing. If you are working on DS, ML or scientific computing, this talk is for you. We'll cover best practices when building Docker containers for data-intensive applications, from optimising your image build, to ensuring your containers are secure and efficient deployment workflows. We will talk about the most common problems faced while using Docker with data intensive applications and how you can overcome most of them. Finally I'll give some practical and useful tips for you to improve your Docker workflows and practises.

Attendees will leave the talk feeling confident about adopting Docker across a range of DS, ML and research projects.

Who and Why (audience) This talk is designed for folks working in data-intensive environments (i.e. Machine Learning, Data Science, research and scientific computing) and that are either using Docker or want to learn more about how to use Docker in these environments. Attendees will leave the talk feeling confident about adopting Docker in their workflows as well as have acquired several best practices and guidelines to do this robustly. Introduction (5 minutes) About me When is Docker the right choice? Docker for all Python users: introduction to Docker in Machine Learning (ML), Data Science (DS) and research contexts The usual culprits Optimising for data-oriented application (10 minutes) Creating a data-oriented Docker image - how is this different from an app/web image? Choosing the right base image - set yourself for success Dependencies, volumes and code best practices Security and performance (10 minutes) Finding vulnerabilities in your images Image consistency and reproducibility Optimising image building - cache and image size considerations Do not reinvent the wheel - automate! (10 minutes) Consider tools to assist with Dockerfile generation - e.g. repo2docker, dokta Creating templates for projects Automating image build and publishing - e.g. GitHub actions Automated deployment strategies - going from local to deploying your containerised application Conclusions (5 minutes) Top 10 best practices when working with Docker and Python for DS/ML and research Additional resources Thanks and getting in touch

Improve this page