2024-9-19 Meeting Notes

 Date

Sep 19, 2024

 Participants

 

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Alexander Rogovskiy

  • @Thomas Byrne

  • @Alastair Dewhurst

  • @Brij Jashal

  • Lancs: Matt, Steven, Gerard

  • Glasgow:

  • Edinburgh: Rob C

Apologies:

  • @James Walder

  •  

  •  

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

(Gateway Auth failures)

@Thomas, Jyothish (STFC,RAL,SC)

 

Saturation on the external gateways caused by above average background load and multihop transfers to antares. To investigate VO workflow provenance.

subset of gateways on 5.7.1

 

Compilation and rollout status with XrdCeph and rocky 8: 5.7.x

@Thomas, Jyothish (STFC,RAL,SC)

All (external gateways) on 5.7.0 now, being upgraded to 5.7.1

batch still on 5.5.4; to upgrade to 5.7.1 (with proxy cache patch) with Tom Birkett time.

5.7.1 will have FD and memory fixes.as well as improvements for the XCache

 

XrootD Workshop plan

@Alastair Dewhurst

@Katy Ellis

XRootD and FTS Workshop @ STFC UK

discussed at the storage meeting

Xrootd 6 is coming

xrdceph is getting merged back into core xrootd

 

Shoveler

@Katy Ellis

Shoveler installation and monitoring

all the batch farm wn connected fine

shoveler is meant to run on the same host as the gw to avoid UDP packet loss

 

Deletion studies through RDR

@Ian Johnson

 

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

RAL deletions are within allowed times for ATLAS tests currently

 

 

 

WN changes

@Alexander Rogovskiy

Read requests fail for proxy+origin setup under heavy load · Issue #2308 · xrootd/xrootd now in 5.7.1 (and patched 5.7.0).
To be deployed over first 2 weeks of October

 

Xrootd testing framework

 

XRootD Site Testing Framework

testing VM to be set up and be used for preprod testing

Rob C - working on kubernetes deployment component

Unit tests for xrootd (added since 5.7.0)

ctest -VV -C Release -DCDASH=1 -DCOVERAGE=1 -S test.cmake
CDASH sends the test to the xrootd cdash server https://my.cdash.org/index.php?project=XRootD
run as non root

Pull requests · stfc/xrootd-testing-framework

 

XrootD gateway specs

 

100Gb NICs
current usage
25Gb NIC
memory use of xrootd process ~25-35GB
context switch ~800kHz
CPU load ~10-20

 

SKA Gateway box

@James Walder

https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180

 

 

Future developments ideas planning work

@Ian Johnson @Thomas, Jyothish (STFC,RAL,SC)

Notes from planning meeting 22-04-2024

 

Tokens Status

 

To split this into Operational aspects and any development / long-term planning aspects.

technical implementation to accept tokens is in place, issues seem to be due to VO use cases on scheduling and accounting

 

Checksums fixes

@Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC)

There is a github issue open to merge this upstream Make stale checksum check optional for ceph storage endpoints · Issue #2338 · xrootd/xrootd

 

 

 

 

on GGUS:

Site reports

Lancaster: Updated to xrootd 5.7.1. using less file descriptors (~2/3 as before)

 

 

image-20240905-121619.png
Updated to 5.7.1 on lunchtime on the 4th.

 

Glasgow:

Ox - Stageout failures to Echo with timeouts https://bigpanda.cern.ch/job?pandaid=6338129522 with exact error
“Error description: pilot, 1151: File transfer timed out during stage-in: mc21_13p6TeV:EVNT.29070483._000105.pool.root.1 from RAL-LCG2-ECHO_DATADISK, copy command timed out: TimeoutException: Timeout reached, timeout=448 seconds')]:failed to transfer files using copytools=['rucio']”

 

image-20240919-124107.png

 

 

 

 

 Action items

 

  •  

  •  

 

 Decisions