Description
A beginner friendly talk that goes from the manifestation of a bug in production, traces it all the way to its inception, and asks what other than code needs to change to ship faster and NOT break things.
We open with the support rotation pager going off as customers of a low latency realtime system report an outage. With every second of downtime resulting in revenue loss, it is all hands on deck for the site reliability team, Dev team, data team, product owners are peeling down the stack. We will see the to the commit messages in polyglot systems that caused the failure, the bug getting squashed, hotfix getting shipped and crisis is averted.
Cut to the blameless postmortem - the real life challenges of issue resolution that is more than changing lines in code, debugging workflows, monitoring gaps, testing limitations, navigating change management processes, team dynamics. for each of these pillars we will see what allowed the teams to shift left, and shorten a bug's lifecycle.