...
Item | Presenter | Notes | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
XRootD Releases | 5.6.2-2 is out | ||||||||||
Prefetch studies | Alex | (temporarily to be rolled back, with the ongoing work in batch farm WNs) | |||||||||
Deletion studies through RDR | Ian | ||||||||||
ATLAS concern over deletion rate for DC24 | JW | DC24 ATLAS expected _average_ Can we cope with this rate (assuming additional gateways) without fundamental changes? Rate (nominal) for atlas assumes therefore ~ 20Hz Production deletion times (recent logs); only including the time within ceph, and not the xrootd and client RTT: count 167980.000000 mean 2.951339 std 5.467953 min 0.015000 25% 0.282000 50% 0.570000 75% 3.486000 max 271.880000 | |||||||||
CMSD rollout |
| ||||||||||
Future Architecture of Data access on WNs | VOs asked to provide input on their requirements / use cases | ||||||||||
Gateways: observations | workernode write traffic temporarily redirected to gateways on the new network. Results look promising, initial testing of 3 generations to one gateway resulted in 40k uploads over 1.5 days with only 1 failure due to an expired certificate proxy. This change will be reverted once external ipv6 is available on the new network, but future separation of job and fts traffic seems sensible | ||||||||||
CMSD outstanding items | Icinga / nags callout tests changes. - live and available Improved load balancing / server failover triggering - better 'rolling server restart script' Documentation; setup / configuration / operations / troubleshooting / testing Review of Sandbox and deployment to prod:
| ||||||||||
Tokens testing | NTR | ||||||||||
AAA Gateways | Sandbox ready for review: http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3 | ||||||||||
SKA Gateway box | /wiki/spaces/UK/pages/215941180 now working using ska pool on ceph dev Initial Iperf3 tests: (see table and plots below).
| ||||||||||
extra gateways deployment | … awaiting networking updates; 4 being repurposed for internal (mostly) writes … correlation between 'spikes' on new internal gateways to additional jobs running by particular VOs. | ||||||||||
ALICE WN gateways |
(Birmingham using eos, Oxford no storage) Relationship to OSD issues ? |
Test
Src
Dest
Thr [Gb/s] (single stream, naive iperf3)
Gateway <->SN
Xrootd01[10.16.190.4]
*Ceph-sn1053 [130.246.177.167]
11.8
Gateway <->SN
*Ceph-sn1053 [130.246.177.167]
Xrootd01[10.16.190.4]
19.6
Gateway <->SN
*Xrootd01[10.16.190.4]
Ceph-sn1053 [130.246.177.167]
11.3
Gateway <->SN
Ceph-sn1053 [130.246.177.167]
*Xrootd01[10.16.190.4]
23.0
Gateway <->SN
*Xrootd01[10.16.190.4]
Ceph-sn1128 [130.246.178.98]
14.1
Gateway <->SN
Ceph-sn1128 [130.246.178.98]
*Xrootd01[10.16.190.4]
23.1
SN ↔︎ SN
~ 12 – 14 Gb/s
Jasmin gpuhosts(↔︎)
20 – 25Gb/s (perhaps untuned 100 Gb/s links)
gpuhost ↔︎ xrootd01
50 (gpu → xrd), 25 (xrd-> gpu)
SN → SN window scaling:
...
Best practice document for Ceph configuration? | e.g. autoscaling features ? |
on GGUS:
Site reports
Lancaster - Nothing exciting going on .
Glasgow
✅ Action items
⤴ Decisions
...