As developers we know (too well) there’s no such thing as bug-free software. Whether you work in a small or big production software, incidents are inevitable and being prepared is key. How would you handle, for example, a database outage if it happened now? Or a critical bug that is affecting half of your users?
Navigating in crisis mode is never easy, but having a great company culture and recovery plan gives you guidance and mitigates damage. In this talk, I will share some success cases, such as GitLab database outage recovery, and my personal experience as a project manager overcoming a critical incident in a subscription system built with Django.
What can you do to prepare your team? When should you enter crisis mode? How to assemble a recovery plan? To answer these and other questions, I will provide a step-by-step guide, from an Modern Agile perspective, starting with the bug discovery, and handling the client’s expectations, through the data recovery, until your incident postmortem.