Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Is discussed in many meetings this week, we had a short power outage last Friday night knocked out a small chunk of our cluster. We came out of that looking okay, with just a few degraded PGs, but keep tripping up as Ceph goes readonly due to falsely thinking an OSD is full (when it’s got 25% free space) until Gerard kicks it. Gerard tracked it to so existing (since ~Pacific) CEPH bugs.

(We’re not actually 100% sure that recovering from the power outage is the root cause of this issue or just an event that created a need for data shuffling around the OSDs, but it certainly didn’t help).

...