/
2025-01-23 Meeting Notes

2025-01-23 Meeting Notes

 Date

Jan 23, 2025

 Participants

 

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Ian Johnson

  • @Alexander Rogovskiy

  • Lancs: Matt, Steven, Gerard

  • Glasgow: Sam

Apologies:

 

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

 

 

mitigations have been communicated to ATLAS for the jobs using 5.6.0 clients

reboot campaign

 

Checksums issue with an ATLAS file

 

[XrdCks] Checksum request during transfer locks partial file checksum into metadata for Ceph · Issue #2388 · xrootd/xrootd

GGUS /login

Checksum requested before whole file is updated. No ability to do stale checksum check in ceph, so original checksum ‘sticks’ to the file.

fix in place RAL side by clearing checksums after a write is complete

 

cms-aaa naming convention

 

cms-aaa is the only remaining personality to use proxy/ceph as the xrootd service names


Separate naming convention would be more appropriate, to have main/supporting

(not so urgent).

CC created, and sandbox is prepared and has been tested on a test host

 

 

XRootD Managers De-VMWareification

@Thomas, Jyothish (STFC,RAL,SC)

Option 2 preferred for efficiency, but Option 1 decided on

Option 1 would be simpler to implement for a temporary fix, as the move would be reversed

antares tpc nodes to be moved to an echo leafsw, to confirm ipv4 real estate with James
lfsw30 (UPS room) decided on destination

hosts moved to rack, renamed and IP assigned. pending DI advertisement

 

Compilation and rollout status with XrdCeph and rocky 8: 5.7.x

@Thomas, Jyothish (STFC,RAL,SC)

5.7.2 published.
Investigating xrootd.redirect for write operations.

5.7.2 skipped on farm due to pfc bug,

possible RAL release 5.7.3 equivalent with a fix for that and 5.6.0 client compatibility

 

Shoveler

@Katy Ellis

Shoveler installation and monitoring

 

 

On the fly Checksums
https://stfc.atlassian.net/browse/XRD-98

@Ian Johnson

 

Added configuration to PoC: option to turn on/off Adler32 on-the-fly calculation.

Proved ability to set XrdCks.adler32 attribute from “standalone” code (running from the command line), will incorporate this into PoC code next. (Wasted time looking for attribute in wrong file…)

also to measure - trougput pattern (does this replicate the double troughput seen currently on first checksum request?)

discussed on possible implementation as plugin/base xrootd

crc32 also implemented here, noted that any new communities should use straight crc32 variants.

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

NTR

 

XRootD Writable Workernode  Gateway Hackaton

 

@Thomas, Jyothish (STFC,RAL,SC)

XRootD Writable Workernode  Gateway Hackaton (XWWGH)


Hackaton writeable workernode

sandbox with fixes present, tested on lhcb workernode, reading works fine as is, writes still need testing to let jobs only write on that WN

https://stfc.atlassian.net/browse/GSTSM-284

 

Xrootd testing framework

 

XRootD Site Testing Framework

Discussion in Storage Meeting in how to integrate the various testing structures within the UK. container with the testing framework TBD

 

100 GbE Gateway testing:
SKA / Tier-1

@James Walder @Thomas, Jyothish (STFC,RAL,SC)

UKSRC - XRootD used for SRCNet testing

Teir-1 cabled, but awaiting some work to progress on the Swtich.

 

 

 

UKSRC Storage Architecture

 

 

 

Tokens Status

 

  • Operational

  • Technical

  • Accounting

 

 

 

 

on GGUS:

Site reports

 

Lancaster: Following on from last week, we were looking at the load reported by the (default) cmsd load reporting scripts, and they didn’t seem to match up to any numbers we could pull from the servers. We got distracted by other things before we could dive deeper.

LSST planning to use small files for read/writes, planning to remove TLS on pure xrootd, these seems to be intermediate files, but might need to be made available for quality conrol? looking at object store route (s3 for internal use) maybe?

combination of uid/host based auth resulted in the following error on curl:

unknown.2:28@comp21-04.private.dns.zone Unable to open /cephfs/grid/dteam/curltest; permission denied

 


 

 

Glasgow - Brief failures to authenticate internally - some of the lsc files for atlas iam were out of date despite using RPM. (possible issue on cron job), looking forward to the on the streamed checksums

 

 Action items

How to replace the original functionality of fstream monitoring, now opensearch has replaced existing solutions.

 

  •  

  •  

 

 Decisions