2024-9-19 Meeting Notes
Date
Sep 19, 2024
Participants
@Thomas, Jyothish (STFC,RAL,SC)
@Alexander Rogovskiy
@Thomas Byrne
@Alastair Dewhurst
@Brij Jashal
Lancs: Matt, Steven, Gerard
Glasgow:
Edinburgh: Rob C
Apologies:
@James Walder
CC:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Operational Issues (Gateway Auth failures) | @Thomas, Jyothish (STFC,RAL,SC)
| Saturation on the external gateways caused by above average background load and multihop transfers to antares. To investigate VO workflow provenance.
subset of gateways on 5.7.1 |
|
Compilation and rollout status with XrdCeph and rocky 8: 5.7.x | @Thomas, Jyothish (STFC,RAL,SC) | All (external gateways) on 5.7.0 now, being upgraded to 5.7.1 batch still on 5.5.4; to upgrade to 5.7.1 (with proxy cache patch) with Tom Birkett time. 5.7.1 will have FD and memory fixes.as well as improvements for the XCache |
|
XrootD Workshop plan | @Alastair Dewhurst @Katy Ellis | XRootD and FTS Workshop @ STFC UK discussed at the storage meeting Xrootd 6 is coming xrdceph is getting merged back into core xrootd |
|
Shoveler | @Katy Ellis | Shoveler installation and monitoring all the batch farm wn connected fine shoveler is meant to run on the same host as the gw to avoid UDP packet loss |
|
Deletion studies through RDR | @Ian Johnson
|
|
|
Deletions | RAL deletions are within allowed times for ATLAS tests currently
|
| |
WN changes | @Alexander Rogovskiy | Read requests fail for proxy+origin setup under heavy load · Issue #2308 · xrootd/xrootd now in 5.7.1 (and patched 5.7.0). |
|
Xrootd testing framework |
| testing VM to be set up and be used for preprod testing Rob C - working on kubernetes deployment component Unit tests for xrootd (added since 5.7.0) ctest -VV -C Release -DCDASH=1 -DCOVERAGE=1 -S test.cmake |
|
XrootD gateway specs |
| 100Gb NICs |
|
SKA Gateway box | @James Walder | https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180
|
|
Future developments ideas planning work | @Ian Johnson @Thomas, Jyothish (STFC,RAL,SC) |
| |
Tokens Status |
| To split this into Operational aspects and any development / long-term planning aspects. technical implementation to accept tokens is in place, issues seem to be due to VO use cases on scheduling and accounting |
|
Checksums fixes | @Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC) | There is a github issue open to merge this upstream Make stale checksum check optional for ceph storage endpoints · Issue #2338 · xrootd/xrootd |
|
on GGUS:
Site reports
Lancaster: Updated to xrootd 5.7.1. using less file descriptors (~2/3 as before)
Glasgow:
Ox - Stageout failures to Echo with timeouts https://bigpanda.cern.ch/job?pandaid=6338129522 with exact error
“Error description: pilot, 1151: File transfer timed out during stage-in: mc21_13p6TeV:EVNT.29070483._000105.pool.root.1 from RAL-LCG2-ECHO_DATADISK, copy command timed out: TimeoutException: Timeout reached, timeout=448 seconds')]:failed to transfer files using copytools=['rucio']”
Action items