2024-05-09 Meeting Notes
Date
May 9, 2024
Participants
@Thomas, Jyothish (STFC,RAL,SC)
Apologies:
@James Walder
CC:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Operational Issues | @Thomas, Jyothish (STFC,RAL,SC) | 5.6.9 deployment failed, deployed 5.5.4 with case insensitive headers Sam - sl7 doesn’t crash but is less stable, rocky8/9 crashes with xrdceph glibc version? |
|
CHEP Abstract ideas | @Thomas, Jyothish (STFC,RAL,SC) | Deadline extended to 17th; one CHEP paper for load balancing, one from Matt, gerard/lancs for ceph monitoring SKA high throughput abstract |
|
XrootD Workshop plan | @Alastair Dewhurst | Registrations are open (pending prettification) |
|
Rocky 8 and 9 migration planning |
| CC passed, 4 gateways are planned to be deployed today remaining upgrades to be done after next week batch farm will be undergoing simultaneous updates preprod farm upgraded gridftp ones will stay until june 3rd |
|
Shoveller | @Katy Ellis |
| |
Future developments ideas planning work | @Ian Johnson @Thomas, Jyothish (STFC,RAL,SC) |
| |
Deletion studies through RDR | @Ian Johnson
| .Preliminary deletion rate plot from DC24 “dip-stick” sampling (ATLAS VO):
plot taken for data during DC steven looking for cephfs deletion studies lancs-timestamp/filesizes vs number of slow ops comparison? |
|
Deletions | what’s the theoretical limit? |
| |
Planning for ALICE CMSD redirection | @Thomas, Jyothish (STFC,RAL,SC) | keepalived setup is working 1 dev gateway is being added to this cluster |
|
Checksums fixes | @Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC) | '21 generation is being rolled out '21 are being swapped with '22 job types as the '22s have separate os and disk drives, additional HW for '21 in a month on the WNs the checksums are forwarded to the prod cluster the preprod farm contributed ~5% of the total checksums |
|
Prefetch studies and WN changes | @Alexander Rogovskiy | Some more data from the overload event, namely efficiency and error rates of the on 2021 gen (prefetch on): It would be interesting to compare this to the “prefetch off” configuration. So far meaningful comparison is not possible, since the number of WGProduction jobs is nowhere near the numbers during the overload event (23.04.2024): stress test for WNs? preprod CEs targetting internal xrootd cluster |
|
Tokens Status | @Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis |
|
|
CMSD Load balancing | @Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC) | PR:
|
|
SKA Gateway box | @James Walder | https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180 2 new servers racked up; awaiting netbox configuration. |
|
|
|
|
|
Xrootd testing framework | @Mariam Demir |
|
|
on GGUS:
Site reports
Lancaster - a week of living on Reef hasn’t yielded many operational issues, the only one of note was that Reef wasn’t happy with the “trimming settings” for the MDS, the Pacific defaults needed to be cranked up - otherwise smooth sailing. Dealing with fallout from a lot metrics being renamed and changes to the logging which has reduced our monitoring capabilities a bit, but that’s a niggle. Next task is update all the clients.
Glasgow:
redirector black holing -5.6.9, some crashes on the server
Do we want to talk about Durham?
having interesting ceph problems (files with read lock) transfers hanging and locking files after a while. Paul switching to match lancs and glasgow cephfs to see if it fixes things
Action items