...
Alert is received from opsgenie
...
check icinga test status and output https://icinga.scd.stfc.ac.uk/icingaweb2/search?q=webdav_service#!/icingaweb2/icingadb/service?name=ha-check_ceph_xrootd_webdav_service&host.name=echo-manager01.gridpp.rl.ac.uk
...
check load distribution on gateways https://vande.gridpp.rl.ac.uk/next/d/0AnwKrEVk/xrootd-manager-monitoring?orgId=1&refresh=1m&from=now-6h&to=now&var-hosts=echo-manager01.gridpp.rl.ac.uk&var-hosts=echo-manager02.gridpp.rl.ac.uk&var-Bin=1m&var-rp=1_day&var-prefix=mean_&var-time=1_week
...
if a single gateway shows high load, check its general health by searching the hostname in icinga https://icinga.scd.stfc.ac.uk/icingaweb2/dashboard
...
check crash dumps on hosts if they show disk near full, clear old dumps
location: /var/spool/xrootd/unified
...
See the Ceph documentation here: