Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apologies:

CC:

\uD83E\uDD45 Goals

...

Item

Presenter

Notes

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

brief hiccup on icinga due to ipv6 issues
Ipv6 preference might be something outside XRootD…

XRootD Managers De-VMWareification

(Moving to physical hosts)

Thomas, Jyothish (STFC,RAL,SC)

/wiki/spaces/GRIDPP/pages/872644647

XRootD Cluster Shuffle

View file
nameRedirector de-VMWareification.pptx

CC due today due to a better but slightly more complicated aquilon procedure being used for the first time

Release of 5.7.3

(May expect an 5.8.X prior to 6.X?)

https://github.com/xrootd/xrootd/releases/tag/v5.7.3

  • Major bug fixes
    [Seckrb5] Avoid null pointer dereference (#2385)
    [XrdPfc] Fix file descriptor leak when reading file size from cinfo file (#2392)

  • Minor bug fixes
    [Protocol] do_WriteSpan() - Add written bytes in file statistics (#2368)
    [XrdHttp] Correct response code for PUT (from 200 to 201) (#2382)
    [XrdHttp] Set oss.asize if object size is known (#2378)
    [XrdOfs] Correct forward declaration of XrdSfsFSctl (#2405)

  • Miscellaneous
    [CI] Drop CentOS 7 builds from GitHub and GitLab CI
    [CI] Move macOS GitHub Actions workflow to macOS 15
    [Docker] Add Dockerfile for Alpine Linux
    [Docker] Remove Dockerfile to build on CentOS 7
    [Docker] Update docker/ subdirectory setup and xrd-docker script
    [Misc] Fix compilation with GCC 15 (#2411)
    [Tests] Fix check for running process to prevent setup failures
    [XrdCl] Improve checking of logging format strings (#2380)
    [XrdSciTokens] Add tests for token-based authorization (#2381)

  • RAL branch will be updated (5.7.3patched on stfc fork)

Checksums issue with an ATLAS file

https://github.com/xrootd/xrootd/issues/2388

https://ggus.eu/index.php?mode=ticket_info&ticket_id=169360

Checksum requested before whole file is updated. No ability to do stale checksum check in ceph, so original checksum ‘sticks’ to the file.

fix in place RAL side by clearing checksums after a write is complete

cms-aaa naming convention

Thomas, Jyothish (STFC,RAL,SC)

cms-aaa is the only remaining personality to use proxy/ceph as the xrootd service names


Separate naming convention would be more appropriate, to have main/supporting

(not so urgent).

CC created, and sandbox is prepared and has been tested on a test host

cms-aaa jemalloc use

Thomas, Jyothish (STFC,RAL,SC)

testing on svc20, some memory leak still present

Compilation and rollout status of RAL XRootD versions

Thomas, Jyothish (STFC,RAL,SC)

5.7.2 published.
Investigating xrootd.redirect for write operations.

5.7.2 skipped on farm due to pfc bug,

5.7.3 to be realed soon

Shoveler

Katy Ellis

Shoveler installation and monitoring

Katy Ellis to feed back Lancaster (slow rate) observations to shoveler / CERN devs (possibly impacted by the infrastructure behind the Collector).

To consider mitigations if unable to progress.

On the fly Checksums

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-98

Ian Johnson

Integrated checksum attribute storage into PoC.

Measured time to transfer 10x3GiB files in parallel to a dev gateway with xrdcp verifying the source checksum.

(lower bar is from the on-the-fly checksumming)

10x3GiBfilesInParallel.pngImage Modified

Next steps: add in optional CRC32C calculation. Conduct larger-scale performance tests, ideally against a gateway machine which is more representative of production GWs.

Sufficient testing will be critical. To be discussed.

Deletions

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-83

NTR

XRootD Writable Workernode  Gateway Hackaton

Thomas, Jyothish (STFC,RAL,SC)

XRootD Writable Workernode  Gateway Hackaton (XWWGH)


Hackaton writeable workernode

sandbox with fixes present, tested on lhcb workernode, reading works fine as is, writes still need testing to let jobs only write on that WN

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyGSTSM-284

first write completed!
Proper DNS poisoning for the gateway was added yesterday, after the LHCb configuration change. So, pilots that started on the test WN before the poisoning will try to upload data via local gateway, but fail. The failures should disapper once all “old” pilots are gone. For now we have (the plots shows uploads via root protocol to ECHO from RAL WNs):

ad5f9976df92463991b68504a9b2d419.png

Xrootd testing framework

XRootD Site Testing Framework

Discussion in Storage Meeting in how to integrate the various testing structures within the UK. container with the testing framework TBD

Plan: file query system to summarize XRootD Logs

Plan to create a system to store info from across all gateways to search a filename and get creation time, last write time, last successful stat and deletion time in case of ‘lost’ files. Possible graduate sideproject.

100 GbE Gateway testing:
SKA / Tier-1

James Walder Thomas, Jyothish (STFC,RAL,SC)

UKSRC - Acting as source for SRCNet verification tests; not being stressed so far …

Teir-1 .

UKSRC Storage Architecture

Through discussions, need to change the DNS entries for the data and mgmt interfaces, update netbox and reconfigure in AQ. Data network will be (exclusively) for the DTN / data traffic. mgmt for ancillary needs (icinga, AQ). Host will be known via its mgmt dns name (the canonical name).

Tokens Status

  • Operational

  • Technical

  • Accounting

...

We have one host on 5.7.3, nothing exploded. Will roll out to the rest of our machines shortly.

We’ve had a bunch of issues with Shoveller not keeping up.

image-20250130-132005.pngImage Added

...

Glasgow - Brief failures to authenticate internally - some of the lsc files for atlas iam were out of date despite using RPM. (possible issue on cron job), looking forward to the on the streamed checksums

...