Current Sandbox:

http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-gateway-xrootd-cmsd

Fabric requirements

named:
echo-internal-manager01.gridpp.rl.ac.uk
echo internal-manager02.gridpp.rl.ac.uk

with associated x509 certificates with the following SANs:
*.echo.stfc.ac.uk,
xrootd.echo.stfc.ac.uk
webdav.echo.stfc.ac.uk
internal.echo.stfc.ac.uk

with external firewall holes for port 1094 (xrootd traffic)

they should should be able to contact echo gateways on port 1094,1095 and 1213

with the following specs
4 CPUs
8GB RAM
60GB disk

with IP addresses changed so that they are in the OPN subnet
Ideally they should be in the lower part of 130.246.176.0/24 https://netbox.esc.rl.ac.uk/ipam/prefixes/323/ip-addresses/  (James A's words.) (v4 and v6)

with AAAA DNS records added once set.

<next bit of request is sent to tier1-certificates>

With host certificates with additional SANs matching *.echo.stfc.ac.uk

Operational items

Know issues / limitations

N/A

Manager hosts

Frontend:

https://rdr.echo.stfc.ac.uk:1094
root://rdr.echo.stfc.ac.uk:1094

echo-manager01.gridpp.rl.ac.uk
echo-manager02.gridpp.rl.ac.uk

Restarting services

systemctl restart xrootd@{unified,tpc}
systemctl restart cmsd@unified

Blacklisting of server (gateway) hosts

On each of the manager hosts the following file should be used, and the relevant gateway host included:

/etc/xrootd/cms.blacklist

add the given host on a single line (wildcards are in principle also ok).
This file is re-read on a per-minute basis, and requires no restart of services

if a host in the blacklist does not exist, the blacklist will fail to parse and will be ignored after a service restart

ensure the xrootd:xrootd ownership is set for it

Adding a new Server (Gateway host) to the cluster

When a new Gateway needs to be added to a cluster, the following steps (in addition to the usual set of checks for ensuring a fully functional gateway) are required.

Development items

Services

A new service has been created to hold the list of manager hosts for each ceph instance (e.g. echo)

xrootd-clustered

For Echo, the specific instance of this service is called xrootd-clustered-echo

These are added with

aq add_required_service --service xrootd-clustered --archetype ral-tier1 --personality ceph-unified-gw-echo
aq add_required_service --service xrootd-clustered --archetype ral-tier1 --personality ceph-unified-gw-echo-test
aq add_required_service --service xrootd-clustered --archetype ral-tier1 --personality ceph-xrootd-manager-echo-test

A host may need to be reconfigured in order to get the new service included in it, and a couple might fail unless this is done; e.g.

aq reconfigure --hostname ceph-gw14.gridpp.rl.ac.uk  --personality ceph-unified-gw-echo  --archetype ral-tier1

Xrootd and CMSD configuration

The configuration for xrootd and csmd is stored in the xrootd-unified.cfg configuration file (and the additional xrootd-tpc.cfg - for root TPC transfers).

keepalived

The keepalived configuration for the manager CMSD hosts is here:

 features/keepalived/echo-managers

A summary of the main files:

Manager cluster setup

aq add_personality --personality ceph-xrootd-manager-echo-test --eon_id 14 --copy_from ceph-unified-gw-echo-test --archetype ral-tier1
aq add_cluster --cluster xrootd_manager_echo --archetype ral-tier1-clusters --personality keepalived --down_hosts_threshold 1 --campus harwe
ll --sandbox orl67423/jw-gateway-xrootd-cmsd

aq cluster --cluster xrootd_manager_echo --hostname echo-manager01.gridpp.rl.ac.uk --personality ceph-xrootd-manager-echo-test
aq cluster --cluster xrootd_manager_echo --hostname echo-manager02.gridpp.rl.ac.uk --personality ceph-xrootd-manager-echo-test

aq compile --cluster xrootd_manager_echo
aq make  --hostname echo-manager02.gridpp.rl.ac.uk && aq make  --hostname echo-manager01.gridpp.rl.ac.uk

New cluster

Fabric


Could you please create 2 new rocky8 VMware hosts which should act similar roles as echo-manager01.gridpp.rl.ac.uk,

named:
echo-alice-manager01.gridpp.rl.ac.uk
echo-alice-manager02.gridpp.rl.ac.uk

with associated x509 certificates with the following SANs:
echo.stfc.ac.uk
alice.echo.stfc.ac.uk
*.echo.stfc.ac.uk 
*.s3.echo.stfc.ac.uk

with external firewall holes for port 1094 (xrootd traffic)

they should should be able to contact echo gateways on port 1094,1095 and 1213

with the following specs
4 CPUs
8GB RAM
60GB disk

with IP addresses changed so that they are in the OPN subnet
Ideally they should be in the lower part of 130.246.176.0/24 https://netbox.esc.rl.ac.uk/ipam/prefixes/323/ip-addresses/  (James A's words.) (v4 and v6)

with AAAA DNS records added once set,

along with a pair of floating IPs (like 130.246.176.2 and 130.246.176.3 and the associated v6 2001:630:58:1820::82f6:b002 and 2001:630:58:1820::82f6:b003) to be assigned to keepalived for load balancing

Aquilon

aq add service --service xrootd-clustered --instance xrootd-clustered-echo-internal
aq bind_server --service xrootd-clustered --instance xrootd-clustered-echo-internal --hostname echo-internal-manager01.gridpp.rl.ac.uk

aq map_service --service xrootd-clustered --instance xrootd-clustered-echo-internal --archetype ral-tier1 --personality ceph-gw-echo-internal --campus Harwell --justification tcm=000

copy /shared/service/xrootd-clustered/xrootd-clustered-echo into /shared/service/xrootd-clustered/xrootd-clustered-echo-internal and replace naming in configs appropriately

copy ral-tier1/features/keepalived/echo-managers to ral-tier1/features/keepalived/echo-managers-internal
in ral-tier1/features/keepalived/echo-managers-internal/config.pan, replace the ip addresses with the new floating ips and replace vrid[N] with a different number (not included in other keepalived configs)