...
4:10- confirmed significant improvement, things look OK for the weekend
Key Takeaways
...
group meeting was very helpful in resolution and prioritization during the incident.
quick turnaround for testing different ideas due to implementer having access to external host (lxplus)
load issues can compound each other to make matters worse
system resources should be reviewed at appropriate periods to ensure they’re fit for purpose.
resource access timelines should be clarified on proposed solution - e.g. CPU could have been added to the VM earlier if it was confirmed someone present at the meeting could do it, which was overlooked and not well worded.
This oversight did result in a more thorough investigation that improved the efficiency of the system, but could have been cleared up earlier if wasn’t.
Good rapport and expertise awareness enabled a focused group to be called to assist.