Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

new network traffic

storage monitoring

kibana dashboard for WN/tranche IOPS monitoring

Echo storage node IOPS (per generation)

XrootD production changes

External gateways

9/05/23 - pgwrite bugfix rollout on external gateways | deemed irrelevant to the incident

Batch farm:

  • 9th May (9:30 am): Draining of first half of worker-nodes

  • 11th May (9:30 am): Update drained worker-nodes

  • Bring back online updated tranches

  • 12th May (16:00): Drain remaining half of worker-nodes

  • 15th May (14:00): Update drained worker-nodes

  • Bring back online updated tranches

  • Health check entire farm

Plots and associated info

...

This has been found to be due to a missed line change in the dockerfile.

Hard limit for read IOPs before the crash in ceph monitoring seem to be 150k, with a desirable rate of <100k. current rate (without readV) is 30k