...
Is discussed in many meetings this week, we had a short power outage last Friday night knocked out a small chunk of our cluster. We came out of that looking okay, with just a few degraded PGs, but keep tripping up as Ceph goes readonly due to falsely thinking an OSD is full (when it’s got 25% free space) until Gerard kicks it. Gerard tracked it to so existing (since ~Pacific) CEPH bugs.
(We’re not actually 100% sure that recovering from the power outage is the root cause of this issue or just an event that created a need for data shuffling around the OSDs, but it certainly didn’t help).
Glasgow
✅ Action items
How to replace the original functionality of fstream monitoring, now opensearch has replaced existing solutions.
...