Description
Papermill has become a widely used tool for executing Jupyter notebooks. Teams use papermill for many production use cases, such as scheduled report generation, model re-training, etc. However, since papermill relies on spinning up a second process with the IPython kernel to execute code, it has several drawbacks when used in production.
This talk will introduce an alternative notebook executor that powers Ploomber, a popular open-source orchestration framework. This new executor runs notebooks in a single process, allowing us to provide capabilities for production workloads, such as interactive debugging with pdb and notebook profiling (CPU, and memory usage). The executor is integrated into the Ploomber project and can also be used from the command line, like papermill.
Some experience with Jupyter (notebook or lab) and the terminal is required. Experience with papermill is optional.
Outline: - [0 - 2 minute] Introduction papermill - [2- 6] Papermill's drawbacks - [6 - 10] Running notebooks in a single process - [10 - 16] Debugging notebook execution with pdb - [16 - 22] Profiling notebooks - [22 - 26] Orchestrating notebook pipelines in production with Ploomber - [26 - 28] Summary and conclusions - [28 - 30] Q&A