/
2025-03-13 Meeting Notes

2025-03-13 Meeting Notes

 Date

Mar 13, 2025

 Participants

 

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Alexander Rogovskiy

  • @James Walder

  • @Katy Ellis

  • @Thomas Byrne

  • Emmanuel

  • Lancs: Gerard, Matt, Steven

  • Glasgow:

  • @Ian Johnson

Apologies:

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

 

 

Worker Node writable XCache fixed and deployed in lhcb nodes

Ceph sn upgrade finished, gws next week - Quincy

LSST voms authdb changes and DUNE token auth implemented.

svc20 got in an 'interesting state':

image-20250313-130202.png

 

Mini-DC Observations, Issues, outcomes

 

Final Observations

 

Compilation and rollout status of RAL XRootD versions

@Thomas, Jyothish (STFC,RAL,SC)

5.7.3 released (awaiting other changes to gateways)

streamed checksums to be deployed tomorrow.

 

XRootD collaboration Meeting

 

 

 

 

cms-aaa naming convention

@Thomas, Jyothish (STFC,RAL,SC)

cms-aaa is the only remaining personality to use proxy/ceph as the xrootd service names


Separate naming convention would be more appropriate, to have main/supporting

(not so urgent).

CC created, and sandbox is prepared and has been tested on a test host

 

 

cms-aaa jemalloc use

@Thomas, Jyothish (STFC,RAL,SC)

testing on svc20, some memory leak still present.

added memory limits, currently under observation

 

Shoveler

@Katy Ellis

 

 

On the fly Checksums
https://stfc.atlassian.net/browse/XRD-98

@Ian Johnson

 

Mis-matched checksums during mini-DC: many of these due to wrongly recording a uninitialised streamed checksum value (looking like “0*[24]*0*”) when a client requested reading a file. Code wasn’t ensuring checksum calculation only when writing data. Hence, many streamed values for a pathname (recorded in plaintext) did not match the checksum stored in the extended attribute.

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

To check deletion timing split between client/cluster response under DC saturation

 

 

XRootD Writable Workernode  Gateway Hackaton

 

@Thomas, Jyothish (STFC,RAL,SC)

 

Being deployed this week. LHCb jobs from 2018-2021 gens are already writing their output data to ECHO via root (i.e. local gateways). Looks good so far.

0f5431649867bcd2ae81abab4eb1244f.png

 

Plan: file query system to summarize XRootD Logs

 

Plan to create a system to store info from across all gateways to search a filename and get creation time, last write time, last successful stat and deletion time in case of ‘lost’ files. Possible graduate sideproject.

Ian plans to extend the database schema from the deletion tests (capturing file write completions and deletions) into a more general event schema.

 

100 GbE Gateway testing:
SKA / Tier-1

@James Walder @Thomas, Jyothish (STFC,RAL,SC)

UKSRC - Acting as source for SRCNet verification tests; not being stressed so far …

Tier-1 .

 

 

 

UKSRC Storage Architecture

 

Tom B. Working on CephAdm setup for the cluster. JW attempting to reinstall the hosts.

 

Tokens Status

 

  • Operational

  • Technical

  • Accounting

 

 

test stress testing framework

 

Script ready, first test to be arranged
https://github.com/Jo-stfc/xrootd-utils/blob/main/ftsstresstest.sh

 

 

 

on GGUS:

Site reports

 

Lancaster:

Our Ceph seems to have recovered after our big outage. Shhh, don’t do anything to spook it.

(discussed earlier in the meeting) Whilst doing some changes in our authdb for LSST it seems to be like “lrw” permissions don’t allow a user to write, but setting permission to “a” (all) for a path does. Waters are muddied by lsst trying to be posh with groups and roles (and not allowing me to get a certificate without a group attached).

Any known issues with this? It’s been a long time since I’ve looked at this, but I swear it used to work.

 

 

 

 

 


 

 

Glasgow -

 

 Action items

 

 

  •  

  •  

 

 Decisions

Related content