Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

16:38 PM - Jyothish, Alex R - monitored situation, no failures seen since this rollout

Takeaways

Good reporting and communication from everyone involved. Emergency meeting proved useful in informing Tier1 production team and management about the nature of the issue and the attempted fix. Previous experience has been useful in devising the patch and RPM creation and deployment pipeline was quick enough to deploy the change in time for sufficient observations to happen before the end of day.

Switch testing (Switch Testing) proved vital for rapid testing of attempted fixes throughout the week, and taking a host temporarily out of prod for the attempted fixes made checking the fix and deployment a lot quicker.