Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Item

Presenter

Notes

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

Thomas, Jyothish (STFC,RAL,SC)

xroo 5.6.9 deployment failed,

deployed 5.5.4 with case insensitive headers

Sam - sl7 doesn’t crash but is less stable, rocky8/9 crashes with xrdceph

glibc version?

https://ggus.eu/index.php?mode=ticket_info&ticket_id=166729

CHEP Abstract ideas

Thomas, Jyothish (STFC,RAL,SC)

Deadline extended to 17th;

one CHEP paper for load balancing, one from Matt, gerard/lancs for ceph monitoring

SKA high throughput abstract

Abstracts

XrootD Workshop plan

Alastair Dewhurst

Registrations are open (pending prettification)

Rocky 8 and 9 migration planning

CC passed, 4 gateways are planned to be deployed today

remaining upgrades to be done after next week

batch farm will be undergoing simultaneous updates

preprod farm upgraded

gridftp ones will stay until june 3rd

Shoveller

Katy Ellis

Shoveler installation and monitoring

Future developments ideas planning work

Ian Johnson Thomas, Jyothish (STFC,RAL,SC)

https://stfc.atlassian.net/wiki/spaces/X/pages/459997229/Notes+from+planning+meeting+22-04-2024?atlOrigin=eyJpIjoiNDRmNDEwOWI3Y2NhNDg5MDg4ZmZiYTNhNTliOWUwNmUiLCJwIjoiYyJ9

Deletion studies through RDR

Ian Johnson

.Preliminary deletion rate plot from DC24 “dip-stick” sampling (ATLAS VO):

deletion-rate.png

plot taken for data during DC

steven looking for cephfs deletion studies

lancs-timestamp/filesizes vs number of slow ops comparison?
(are deletions causing slow ops?)

Deletions

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-83

what’s the theoretical limit?
are the bottlenecks in xrootd?
are async deletions required?

Planning for ALICE CMSD redirection

Thomas, Jyothish (STFC,RAL,SC)

keepalived setup is working

1 dev gateway is being added to this cluster

Checksums fixes

Alexander Rogovskiy Thomas, Jyothish (STFC,RAL,SC)

'21 generation is being rolled out

'21 are being swapped with '22 job types as the '22s have separate os and disk drives, additional HW for '21 in a month

on the WNs the checksums are forwarded to the prod cluster

the preprod farm contributed ~5% of the total checksums

Prefetch studies and WN changes

Alexander Rogovskiy

Some more data from the overload event, namely efficiency and error rates of the on 2021 gen (prefetch on):

pf_on_err.pngpf_on_eff.png

It would be interesting to compare this to the “prefetch off” configuration. So far meaningful comparison is not possible, since the number of WGProduction jobs is nowhere near the numbers during the overload event (23.04.2024):

16a845b82a8be898149f9fc630573654.png

stress test for WNs?

preprod CEs targetting internal xrootd cluster

Tokens Status

Thomas, Jyothish (STFC,RAL,SC) Katy Ellis

CMSD Load balancing

Thomas Byrne Thomas, Jyothish (STFC,RAL,SC)

PR:
https://github.com/stfc/xrootd/pull/8/files

SKA Gateway box

James Walder

/wiki/spaces/UK/pages/215941180

2 new servers racked up; awaiting netbox configuration.
To discuss with James A, AQ configuration

Xrootd testing framework

Mariam Demir

...

Lancaster - a week of living on Reef hasn’t yielded many operational issues, the only one of note was that Reef wasn’t happy with the “trimming settings” for the MDS, the Pacific defaults needed to be cranked up - otherwise smooth sailing. Dealing with fallout from a lot metrics being renamed and changes to the logging which has reduced our monitoring capabilities a bit, but that’s a niggle. Next task is update all the clients.

Glasgow:

redirector black holing -5.6.9, some crashes on the server

Do we want to talk about Durham?

having interesting ceph problems (files with read lock) transfers hanging and locking files after a while. Paul switching to match lancs and glasgow cephfs to see if it fixes things

✅ Action items

⤴ Decisions

...