2024-01-18 Meeting Notes
Date
Jan 18, 2024
Participants
@Thomas, Jyothish (STFC,RAL,SC)
@Alexander Rogovskiy
@Thomas Byrne
Lancs: Steven, Gerard, Matt
Glasgow: Sam
Apologies:
CC:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Operational Issues | @Thomas, Jyothish (STFC,RAL,SC) | packet loss on perfsonar?
|
|
Gateways and WNs: | @Thomas, Jyothish (STFC,RAL,SC) | stable status currently
To resist installing 5.6.4; before the break, one sets of sets (TPC transfers) was failing against another site. To repeat the tests and see |
|
bugfix for calculating striper objects in direct reads |
| passed test on gw8 and code reviewed |
|
ECHO File transfer / throughput studies | @Katy Ellis | Tests of per-file transfer writes into Echo. tests ongoing on svc20 |
|
Checksums fixes | @Alexander Rogovskiy | Status and plans for improving Checksumming work … GitHub - alex-rg/xrd_ckslib (Sandbox prepared and applied to GW8) |
|
Prefetch studies and WN changes | @Alexander Rogovskiy | Sandbox ready and applied to 1 WN, pending envroinment variable for timeout increase |
|
Deletion studies through RDR | @Ian Johnson | continuining with mixed results, previous set was 500 files 5000 files could not get uploaded, wasn’t completed after 20+ hrs (seems to have been a bad time - last Tuesday) 100 X 1GB deletion in 5 s check with Alessandra on rucio deletion concurrency (for DC24) ceph is performing better at the moment |
|
Tokens testing | @Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis | https://stfc.atlassian.net/browse/XRD-63 avoid duplicating basepath by Jo-stfc · Pull Request #2151 · xrootd/xrootd |
|
Understanding CMSD Loadbalancing | @Thomas Byrne | explore different load balancing scheme (weighted placement) testing in internal cluster? how to measure improvements? more instrumented current version to measure improvement things look ~ok at the moment so lower in priority |
|
SKA Gateway box | @James Walder |
| |
Architectural review ‘hackathon’ | All | Plan the process for the Architectural planning of XRootD across the External Gateways and WNs |
|
2024 Planning |
| JW to prepare a summary of the plans for 2024 |
|
on GGUS:
Site reports
Lancaster - Unbalanced redirectors/load causing ceph mounts to be dropped, Sam suggests network QOS to monitor traffic (ceph mds kickout timeout’s 5 min and prefer not to increase) write locks from unresponsive clients can cause pileups on 'healthy' clients. making mds less aggressive would mitigate this but not solve the underlying issue (not reccommended) local read from cephfs mount. mostly coincident with slow osd ops( pg slow > msd slow ops on metadata > osd slow ops > gw issues). osd perf output might have more info. long smart healthcheck?
Glasgow - relatively stable, few network issues. OS/Ceph version update to do.
Action items
@James Walder to schedule a ‘hackathon’ within a F2F to have a session on architectural planning.
@James Walder to prepare an outline of the expected roadmap for XRootD developments in 2024.