/
2025-02-27 Meeting Notes

2025-02-27 Meeting Notes

 Date

Feb 27, 2025

 Participants

 

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Ian Johnson

  • @Alexander Rogovskiy

  • Lancs: Steven, Gerard

  • Glasgow:

Apologies:

James Walder, Matt and Gerard as they’re still stuck in the machine room.

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

 

 

Worker Node writable XCache fixed and deployed in lhcb nodes

Ceph upgrade ongoing - Quincy

 

Compilation and rollout status of RAL XRootD versions

@Thomas, Jyothish (STFC,RAL,SC)

5.7.3 released (awaiting other changes to gateways)

 

XRootD collaboration Meeting

 

https://indico.cern.ch/event/1510817/

Requirements gathering session.

Officially joining the collaboration requires pledging FTEs, but we can still submit patches and PRs without that.

6.0 features


improve CI/security/stability
drop python2 support
timeout changes breaking ABI
improve error handling
review long term http client in xrdcl
c++20
cache fixes for reading replicas
process metalink
reflink file cloning (xrootd erasure coding)

https://github.com/orgs/xrootd/projects/1 for full list


general HTTP/davs improvements requested by various communities, with possibility of davix client being merged into xrootd

xrd.network and fallback manager use by the alice analysis facilities looked interesting, as well as their custom quota plugin. The latter can set a quota per individual user, but they’ve used it to set a quota on each server.

general wish for improvements in logging and error messages.

I had a chat with Guilherme after and proposed a mid/long term plan on making a GeoIP based redirection algorithm - this would benefit us in that it can allow us to have a CMSD cluster for the batch farm, as well as with CMS AAA.

 

 

 

cms-aaa naming convention

@Thomas, Jyothish (STFC,RAL,SC)

cms-aaa is the only remaining personality to use proxy/ceph as the xrootd service names


Separate naming convention would be more appropriate, to have main/supporting

(not so urgent).

CC created, and sandbox is prepared and has been tested on a test host

 

 

cms-aaa jemalloc use

@Thomas, Jyothish (STFC,RAL,SC)

testing on svc20, some memory leak still present

 

Shoveler

@Katy Ellis

Shoveler installation and monitoring

Had a discussion with Andy and Guilherme, as well as a new developer in the Team. There is not much apetite to support it going forwards. One option that was floated around is once the logging improvements are made, those could be passed into a standard log parser and pushed into elasticsearch directly.

 

On the fly Checksums
https://stfc.atlassian.net/browse/XRD-98

@Ian Johnson

 

Logging of streamed and readback checksums is progressing. Config options now allow new version of XrdCeph to:

  • Do nothing different to usual (default behaviour if no options selected),

  • Calculate the streamed checksum but do nothing with it (to allow measuring impact on CPU and memory utilisation),

  • Log the streamed and readback checksum (CSV for import into database),

  • Store the streamed checksum in an extended attribute (different attribute name to “XrdCks.adler32”?).

Would like to run one gateway during testing period with the “do nothing” option as a reference.


Action: check tomorrow for test readiness, test on friday afternoon and possible limited deployment during mini-DC.

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

NTR, apart from revising SQL queries from previous deletion report scripts.

 

XRootD Writable Workernode  Gateway Hackaton

 

@Thomas, Jyothish (STFC,RAL,SC)

 

rolled back due to memory issues related to buffer size.

Xrd-ceph version with write-only buffering is deployed on the LHCb-ony WN (lcg2345). LHCb jobs are again writing data from the preprod farm to ECHO after a short break.


7ef2abef258e2ddef6e0f038c588cc81.png

 

Plan: file query system to summarize XRootD Logs

 

Plan to create a system to store info from across all gateways to search a filename and get creation time, last write time, last successful stat and deletion time in case of ‘lost’ files. Possible graduate sideproject.

Ian plans to extend the database schema from the deletion tests (capturing file write completions and deletions) into a more general event schema.

 

100 GbE Gateway testing:
SKA / Tier-1

@James Walder @Thomas, Jyothish (STFC,RAL,SC)

UKSRC - Acting as source for SRCNet verification tests; not being stressed so far …

Teir-1 .

 

 

 

UKSRC Storage Architecture

 

Tom B. Working on CephAdm setup for the cluster. JW attempting to reinstall the hosts.

 

Tokens Status

 

  • Operational

  • Technical

  • Accounting

 

 

 

 

on GGUS:

Site reports

 

Lancaster:

So we found out the hard way that running Ceph with 25% of your servers on 1Gb NICs just doesn’t work for any load of any significance. Luckily the replacement 25Gb NICs have started arriving.

 

 

 

 


 

 

Glasgow -

 

 Action items

 

 

  •  

  •  

 

 Decisions

Related content

2025-01-30 Meeting Notes
2025-01-30 Meeting Notes
More like this
2025-02-13 Meeting Notes
2025-02-13 Meeting Notes
More like this
2025-02-06 Meeting Notes
2025-02-06 Meeting Notes
More like this
2025-01-23 Meeting Notes
2025-01-23 Meeting Notes
More like this