Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

Apologies:

CC:

\uD83E\uDD45 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

\uD83D\uDDE3 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

Item

Presenter

Notes

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

Checksums issue with an ATLAS file

https://github.com/xrootd/xrootd/issues/2388

https://ggus.eu/index.php?mode=ticket_info&ticket_id=169360

Checksum requested before whole file is updated. No ability to do stale checksum check in ceph, so original checksum ‘sticks’ to the file.

fix in place RAL side by clearing checksums after a write is complete

cms-aaa naming convention

cms-aaa is the only remaining personality to use proxy/ceph as the xrootd service names


Separate naming convention would be more appropriate, to have main/supporting

(not so urgent).

CC created, and sandbox is prepared.

XRootD Managers De-VMWareification

Thomas, Jyothish (STFC,RAL,SC)

View file
nameRedirector de-VMWareification.pptx

Option 2 preferred for efficiency, but Option 1 decided on

Option 1 would be simpler to implement for a temporary fix, as the move would be reversed

antares tpc nodes to be moved to an echo leafsw, to confirm ipv4 real estate with James
lfsw30 (UPS room) decided on destination

hosts moved to rack

Compilation and rollout status with XrdCeph and rocky 8: 5.7.x

Thomas, Jyothish (STFC,RAL,SC)

5.7.2 published.
Investigating xrootd.redirect for write operations.

5.7.2 skipped on farm due to pfc bug,

possible RAL release 5.7.3 equivalent with a fix for that and 5.6.0 client compatibility

Shoveler

Katy Ellis

Shoveler installation and monitoring

On the fly Checksums

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-98

Ian Johnson

Simple PoC calculating Adler32 in the XrdCeph plugin mostly working. Neglible reduction in write rate compared to not calculating Adler32 on-the-fly.

Deletions

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-83

NTR

XRootD Writable Workernode  Gateway Hackaton

Thomas, Jyothish (STFC,RAL,SC)

XRootD Writable Workernode  Gateway Hackaton (XWWGH)


Hackaton writeable workernode

sandbox with fixes present, ready for testing

Jira Legacy
serverSystem Jira
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyGSTSM-284

Xrootd testing framework

XRootD Site Testing Framework

Discussion in Storage Meeting in how to integrate the various testing structures within the UK

100 GbE Gateway testing:
SKA / Tier-1

James Walder Thomas, Jyothish (STFC,RAL,SC)

UKSRC - XRootD used for SRCNet testing

Teir-1 cabled, but awaiting some work to progress on the Swtich.

UKSRC Storage Architecture

Tokens Status

  • Operational

  • Technical

  • Accounting

 

on GGUS:

Site reports

Lancaster: On this week’s Lancaster Rant: We had a period of storage sadness last night. Atlas deleted ~20k files in a space of about 30 minutes, and whilst Ceph was recovering LSST jobs came from behind and gave the storage a wedgie with high IOPs. Cephfs got slow, xrootd servers got sad, some fell over, cephfs got more unhappy. It was a whole thing, and Gerard spent the morning restarting xroot servers with his new scripts.

...

How to replace the original functionality of fstream monitoring, now opensearch has replaced existing solutions.

⤴ Decisions