\uD83D\uDDD3 Date
\uD83D\uDC65 Participants
Lancs: Steven, Gerard, Matt
Glasgow: Sam
Apologies:
CC:
\uD83E\uDD45 Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
\uD83D\uDDE3 Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes | |
---|---|---|---|
Operational Issues | |||
Gateways and WNs: | CMSD load balancing changes - reporting frequency increased to 3s (decision cycle of 6s) resulting load pattern converged more, even under DC24 level loads | ||
Data Challenge Status / observations | |||
Rocky 8 migration planning | Status and issues wit Rocky 8 migration | ||
Summary from S&C week Discussions with Andy H and Brian B | XRootD workshop agenda to be confirmed high system load is unexpected, try monitoring running processes during this period. One possibility might be disk IO locking. Suggested using the throttle plugin and turn off the limits to get the printouts officially supported OSes: el7-8, alma9 openssl3 was causing some issues, but the package is available load balancing: Brian uses static weighting with global sharing (each server gets a fixed share of total transfers). Another possibility is to use heartbeat skew as a metric Checksums - in flight checksums ongoing, Andy remarked these would not confirm the integrity of the file at destination. tokens - token redaction ETA by spring (march). PR with bugfixes sent to xrootd, to be included in 5.7.0 https://github.com/xrootd/xrootd/pull/2152 weird behaviour under investigation - http stops being responsive under small bursts of high load gws randomly stopping authenticating - error seen was malformed CA. next time it happens, check vomsdata::check_from_file in gdb to debug. likely issues with loading CA chains | ||
Planning for ALICE CMSD redirection | redirection needed for the 3 ALICE gws, to move away from the current DNS RR. options: replicating the general setup (keepalived 2 host redundancy) - we’d have 2 managers for 3 servers | ||
ECHO File transfer / throughput studies | |||
Checksums fixes | patch ignoring the stale checksum check performs similar to the checksum library plugin. | ||
Prefetch studies and WN changes | |||
Deletion studies through RDR | Compared to last measurements (two weeks ago), concurrent test deletions are taking longer during DC24 (as expected). Quite a variation in timings observed. Deletion rates decreased (all rates below are for 1000 files with 10 deletion threads):
1 GIB 3.1 Hz 3 GiB 7.6 Hz // Yes, deletion rates vary depending on ECHO loading during DC24 6 GiB 1.4 Hz Normalised for same time scale on Y axis: With the longest deletion times from the sample above being 42 for 1 GiB files and 97s for 6 GiB files, I expect to see many deletions timing out depending on the ECHO loading. | ||
Tokens Status | wlcg token create and modify scopes must include permissions to create and stat superfolders of that path. (xrootd fix included in the stat permissions patch) 'timeout' errors in token auth (permission denied-timeout was reached) during DC24. Seems to be caused by overloading the IAM servers during token deserialization in scitokens-cpp. (it fetches the public key too often) | ||
WLCG IAM testbed | WLCG IAM testbed appears quite ‘limited’ in some of the VO based testing. | ||
Understanding CMSD Loadbalancing | |||
SKA Gateway box | |||
JW to prepare a summary of the plans for 2024 |
on GGUS:
Site reports
Lancaster - Moved to 5.6.7, leaving fireflies on. No issues (touch wood). Added more gateways into the “xroot cluster” - up to 7 now. The notes about cms.sched/perf settings from January were very useful to fall back on so thanks!
Glasgow -
✅ Action items
James Walder to schedule a ‘hackathon’ within a F2F to have a session on architectural planning.
James Walder to prepare an outline of the expected roadmap for XRootD developments in 2024.