Ploomber: Developing Maintainable & Reproducible Data

YouTube

Description

Interactive environments such as Jupyter have a reputation for producing low-quality, unmaintainable code. As a result, it is common for practitioners to go through a refactoring process where the notebook's code transforms into a more modular, maintainable form, usually through scripts and functions. However, this creates friction since the refactoring process happens every time the analysis needs changes, causing practitioners to move back and forth between their notebooks and the refactored code. This constant moving slows down the iteration process and risks reproducibility.

Ploomber is an open-source framework that addresses this problem. It allows practitioners to stay in the interactive interface they are the most productive with while providing the tools to help them build maintainable and reproducible data workflows from day one. In this tutorial, Eduardo and Ido will introduce Ploomber, going from zero to testing pipelines on GitHub Actions, using Pull Requests to collaborate, parallel experiments, and execution in distributed environments like Kubernetes and SLURM.

https://github.com/edublancas/scipy-2022

PyVideo

Ploomber: Developing Maintainable & Reproducible Data

Description

Details