2024-01-18 Meeting Notes


  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Alexander Rogovskiy

  • @Thomas Byrne

  • Lancs: Steven, Gerard, Matt

  • Glasgow: Sam






Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:










Operational Issues

@Thomas, Jyothish (STFC,RAL,SC)

packet loss on perfsonar?




Gateways and WNs:
- Current status and upcoming changes

@Thomas, Jyothish (STFC,RAL,SC)

stable status currently

  • tokens have been deployed for cms/atlas (additional patch for restricting scope foo \ foobar rejection )

  • checksum library

  • prefetch off on WNs

To resist installing 5.6.4; before the break, one sets of sets (TPC transfers) was failing against another site. To repeat the tests and see

Rocky 8 for the Gateways (@Thomas, Jyothish (STFC,RAL,SC) working on a initial setup).


bugfix for calculating striper objects in direct reads



passed test on gw8 and code reviewed


ECHO File transfer / throughput studies

@Katy Ellis

Tests of per-file transfer writes into Echo.
A new Jira is set up to track these changes: https://stfc.atlassian.net/browse/XRD-80
Updates presented at Liaison meeting yesterday.
Preliminary results from iperf3 testing:


tests ongoing on svc20


Checksums fixes

@Alexander Rogovskiy

Status and plans for improving Checksumming work …


(Sandbox prepared and applied to GW8)


Prefetch studies and WN changes

@Alexander Rogovskiy

Sandbox ready and applied to 1 WN, pending envroinment variable for timeout increase


Deletion studies through RDR

@Ian Johnson

continuining with mixed results,

previous set was 500 files

5000 files could not get uploaded, wasn’t completed after 20+ hrs (seems to have been a bad time - last Tuesday)

100 X 1GB deletion in 5 s

check with Alessandra on rucio deletion concurrency (for DC24)

ceph is performing better at the moment


Tokens testing

@Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis




Understanding CMSD Loadbalancing

@Thomas Byrne

explore different load balancing scheme (weighted placement)

testing in internal cluster? how to measure improvements? more instrumented current version to measure improvement

things look ~ok at the moment so lower in priority


SKA Gateway box

@James Walder


Architectural review ‘hackathon’


Plan the process for the Architectural planning of XRootD across the External Gateways and WNs


2024 Planning


JW to prepare a summary of the plans for 2024



on GGUS:

Site reports

Lancaster - Unbalanced redirectors/load causing ceph mounts to be dropped, Sam suggests network QOS to monitor traffic (ceph mds kickout timeout’s 5 min and prefer not to increase) write locks from unresponsive clients can cause pileups on 'healthy' clients. making mds less aggressive would mitigate this but not solve the underlying issue (not reccommended) local read from cephfs mount. mostly coincident with slow osd ops( pg slow > msd slow ops on metadata > osd slow ops > gw issues). osd perf output might have more info. long smart healthcheck?

Glasgow - relatively stable, few network issues. OS/Ceph version update to do.




 Action items

  • @James Walder to schedule a ‘hackathon’ within a F2F to have a session on architectural planning.

  • @James Walder to prepare an outline of the expected roadmap for XRootD developments in 2024.


