Contribute Media
A thank you to everyone who makes this possible: Read More

Dev Ops meets Data Science Taking models from prototype to production with Docker


PyData DC 2016

We present the evolution of a model to a production API that can scale to large e-commerce needs. On the journey we discuss metrics of success and how to use the Kubernetes cluster manager and associated tools for deploy. In addition to the use of these tools we highlight how to make use of the cluster management system for further testing and experimentation with your models.

The chasm between data science and dev ops is often wide and impenetrable, but the two fields have more in common than meets the eye. Every data scientist will be able to lean in and help their career by investing in a basic understanding the basic principles of dev ops. In this talk I present the notions of service level indicators, objectives, and agreements. I cover the rigorous monitoring and testing of services. Finally we demonstrate how to build a basic data science workflow and push to production level APIs with Docker and Kubernetes.

Kubernetes is an opinionated container cluster manager with an easy to use, robust interface. It can be use on very small and very large clusters. Docker is a container system that allows one to build code in an isolated environment. Paired with a container manager such as Kubernetes we are able to manage millions of instances as needed for a production deployment. These tools are two of many different options but are considered among the best open source solutions available.


Improve this page