...
Alert is received from opsgenie
...
check icinga test status and output https://icinga.scd.stfc.ac.uk/icingaweb2/search?q=webdav_service#!/icingaweb2/icingadb/service?name=ha-check_ceph_xrootd_webdav_service&host.name=echo-manager01.gridpp.rl.ac.uk
...
check load distribution on gateways https://vande.gridpp.rl.ac.uk/next/d/0AnwKrEVk/xrootd-manager-monitoring?orgId=1&refresh=1m&from=now-6h&to=now&var-hosts=echo-manager01.gridpp.rl.ac.uk&var-hosts=echo-manager02.gridpp.rl.ac.uk&var-Bin=1m&var-rp=1_day&var-prefix=mean_&var-time=1_week
...
if a single gateway shows high load, check its general health by searching the hostname in icinga
...
check crash dumps on hosts if they show disk near full, clear old dumps
...
See the Ceph documentation here: