Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Alert is received from opsgenie

...

check icinga test status and output https://icinga.scd.stfc.ac.uk/icingaweb2/search?q=webdav_service#!/icingaweb2/icingadb/service?name=ha-check_ceph_xrootd_webdav_service&host.name=echo-manager01.gridpp.rl.ac.uk

...

check load distribution on gateways https://vande.gridpp.rl.ac.uk/next/d/0AnwKrEVk/xrootd-manager-monitoring?orgId=1&refresh=1m&from=now-6h&to=now&var-hosts=echo-manager01.gridpp.rl.ac.uk&var-hosts=echo-manager02.gridpp.rl.ac.uk&var-Bin=1m&var-rp=1_day&var-prefix=mean_&var-time=1_week

...

if a single gateway shows high load, check its general health by searching the hostname in icinga https://icinga.scd.stfc.ac.uk/icingaweb2/dashboard

...

check crash dumps on hosts if they show disk near full, clear old dumps
location: /var/spool/xrootd/unified

...

See the Ceph documentation here:

/wiki/spaces/CD/pages/266600449