All external gateways need to get rebooted every 90 days to install firmware updates and as a general maintenance. The procedure is the same for all gateways except for the ones in the s3.echo.stfc.ac.uk and echo.stfc.ac.uk aliases due to those being in a DNS round robin for rados gw access. Check the spreadsheet for an up to date list of those hosts

Scripts

blacklist.sh

ssh -i .ssh/id_rsa root@echo-manager02.gridpp.rl.ac.uk "echo \"${1}.gridpp.rl.ac.uk\" >> /etc/xrootd/cms.blacklist"
ssh -i .ssh/id_rsa root@echo-manager01.gridpp.rl.ac.uk "echo \"${1}.gridpp.rl.ac.uk\" >> /etc/xrootd/cms.blacklist"

unblacklist.sh

ssh -i .ssh/id_rsa root@echo-manager02.gridpp.rl.ac.uk "sed -i \"/${1}/d\" /etc/xrootd/cms.blacklist"
ssh -i .ssh/id_rsa root@echo-manager01.gridpp.rl.ac.uk "sed -i \"/${1}/d\" /etc/xrootd/cms.blacklist"

Procedure

check the current transfer load on the gateways trough the grafana dashboard.
If the troughput average is more than 22Gb/s (>90% of maximum network capacity) do not proceed
For each host or batch of hosts that is currently in production use:
1. run the following command. hostname_prefix is the part before .gridpp.rl.ac.uk, for example ceph-svc01
  bash blacklist.sh <hostname_prefix>
2. wait till the traffic drops (usually 15 min).
3. ssh into the host and run “reboot“
4. wait for the host to come back (10-20min)
5. check the systemd services xrootd@{unified,tpc} and cmsd@unified are running and active
6. run
  bash unblacklist.sh <hostname_prefix>

XRootD

Reboot Campaign

Scripts

blacklist.sh

unblacklist.sh

Procedure