2023-09-14 Meeting Notes

 Date

Sep 14, 2023

 Participants

  •  

  •  

Apologies:

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

XRootD Releases

 

Awaiting 5.6.2 yet

 

Prefetch studies

Alex

prefetch works but needs an increased timeout env variable
(temporarily to be rolled back, with the ongoing work in batch farm WNs)

 

Deletion studies through RDR

Ian

Most recent set of measurements:

RDR: Upload mean: 559, Max: 962. Delete mean: 26.7, Max: 118

DNS: Upload mean: 266, Max: 508. Delete mean: 4.9, Max 23.1

 

CMSD rollout

 

https://stfc.atlassian.net/browse/XRD-41

Status:

webdav alias has been pruned

 

 

Current configuration of Echo Gateways

 

 

 

Future Architecture of Data access on WNs

 

Requested to prepare for a review / planning of the desired architecture on how WNs will read / write data to / from Echo.
All options to be discussed, e.g. use of the Xcache, WN containers, etc.
Need an initial meeting (and an in what context)?

 

Gateways: observations

 

workernode write traffic temporarily redirected to gateways on the new network. Results look promising, initial testing of 3 generations to one gateway resulted in 40k uploads over 1.5 days with only 1 failure due to an expired certificate proxy. This change will be reverted once external ipv6 is available on the new network, but future separation of job and fts traffic seems sensible

 

CMSD outstanding items

 

Icinga / nags callout tests changes. - live and available

Improved load balancing / server failover triggering -

better 'rolling server restart script'

Documentation; setup / configuration / operations / troubleshooting / testing

Review of Sandbox and deployment to prod:
- Initial review spotted requirement to split the feature to have a non-CMSD version.

 

Tokens testing

 

NTR

 

AAA Gateways

 

Sandbox ready for review:

http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3

 

SKA Gateway box

 

now working using ska pool on ceph dev

Initial Iperf3 tests: (see table and plots below).

  • Actions

    • Ensure Xrootd01 is tuned correct, according to the Nvidia / mellanox instructions

    • Repeat the iperf tests

  • Xrootd tests against:

    • dev-echo

    • cephfs (Deneb dev)

    • cephfs (openstack; permissions/routing issues)?

    • local disk / mem

  • Frontend routing is also being worked on

 

 

extra gateways deployment

 

 

ALICE WN gateways

 

(Birmingham using eos, Oxford no storage)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Test

Src

Dest

Thr [Gb/s] (single stream, naive iperf3)

Gateway <->SN

Xrootd01[10.16.190.4]

*Ceph-sn1053 [130.246.177.167]

11.8

Gateway <->SN

*Ceph-sn1053 [130.246.177.167]

Xrootd01[10.16.190.4]

19.6

Gateway <->SN

*Xrootd01[10.16.190.4]

Ceph-sn1053 [130.246.177.167]

11.3

Gateway <->SN

Ceph-sn1053 [130.246.177.167]

*Xrootd01[10.16.190.4]

23.0

 Gateway <->SN

*Xrootd01[10.16.190.4]

Ceph-sn1128 [130.246.178.98]

14.1

 Gateway <->SN

Ceph-sn1128 [130.246.178.98]

*Xrootd01[10.16.190.4]

23.1

SN ↔︎ SN

 

 

~ 12 – 14 Gb/s

Jasmin gpuhosts(↔︎)

 

 

20 – 25Gb/s (perhaps untuned 100 Gb/s links)

gpuhost ↔︎ xrootd01

 

 

50 (gpu → xrd), 25 (xrd-> gpu)

SN → SN window scaling:

 

on GGUS:

Site reports

Lancaster - Nothing exciting going on.

Glasgow

 Action items

  •  

 

 Decisions