2024-05-02 Meeting Notes

 Date

May 2, 2024

 Participants

  •  

  •  

Apologies:

 

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

@Thomas, Jyothish (STFC,RAL,SC)

2024-04-23 LHCb WGprod Echo overload (further updates?)

proxy memory has been increased to match allocated memory. Things have been ok since then.

Rocky8 containers with checksum library to be deployed next week
"curl changing the case on el9" - caused some issues on FTS to CNAF

 

CHEP Abstract ideas

@Thomas, Jyothish (STFC,RAL,SC)

Inputs and thoughts on existing and possible abstracts
The XRootD software framework plays a pivotal role in data access at WLCG (Worldwide LHC Computing Grid) sites. However, when dealing with the Echo storage service, a Ceph-based Erasure Coded object store at the RAL Tier-1, challenges arise due to the unique characteristics of Echo. In this paper, we address the various improvements done to the service, to improve
improvements/studies on deletion rates, load balancing and throughput increase over DC24

 

XrootD Workshop plan

@Alastair Dewhurst

 

 

Rocky 8 and 9 migration planning

 

CC today for rocky8

 

Future developments ideas planning work

@Ian Johnson @Thomas, Jyothish (STFC,RAL,SC)

Notes from planning meeting 22-04-2024

 

Deletion studies through RDR

@Ian Johnson

 

.

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

 

 

Planning for ALICE CMSD redirection

@Thomas, Jyothish (STFC,RAL,SC)

 

 

Checksums fixes

@Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC)

Rocky8 containers with checksum library currently in preprod, to be deployed next week

 

Prefetch studies and WN changes

@Alexander Rogovskiy

Last week LHCb submitted a lot of WGProduction jobs, which overloaded ECHO. As a result, there were a lot of failed jobs and failed downloads as well:

transfers.png
jobs.png

Seems like prefetch config was rolled-out to only part of the preprod farm.
After the incident, 25.04 it was rolled-out to the 2021 gen. Also memory limit was raised for xrootd proxy recently.

 

Tokens Status

@Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis

 

 

CMSD Load balancing

@Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC)

PR:
revised load balancing algorithm - weighed random selection by Jo-stfc · Pull Request #8 · stfc/xrootd


 

 

SKA Gateway box

@James Walder

https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180

4 Nodes awaiting installation:

2 for Exit pod (+ 1 existing)
1 for cloud
1 for Tier-1 usage

 

 

 

 

 

Xrootd testing framework

@Mariam Demir

 

 

 

on GGUS:

Site reports

Lancaster: Gerard embarked on a grand adventure in CEPH upgrades this week, first to the top of Pacific, then to Reef (all done whilst live). Nothing really important exploded, but the monitoring and dashboards have been finicky. In general smooth sailing though. In hindsight Gerard reckons he might well have skipped going to the top of Pacific first and just gone straight to Reef. On xroot, the LSST auth errors mentioned last week turned out to be on their side (they were using a defunct voms server to serve their proxies) - so not an xroot issue per se, but the fact that this impossible to debug from the xroot side is a huge niggle.

 

 

Glasgow: Sam build newer versions of xrootd on CC7 and 8. (anticipating 5.6.9), and awaiting any updates for LB.

https://github.com/stfc/xrootd-ceph/tree/variableobjectcleanup

https://github.com/xrootd/xrootd/pull/2246

 

 

 Action items

 

  •  

  •  

 

 Decisions