2024-02-29 Meeting Notes

 Date

Feb 29, 2024

 Participants

  • @James Walder

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Katy Ellis

  • @Alexander Rogovskiy

  • @Thomas Byrne

  • Lancs: Matt, Gerard

  • Glasgow:

Apologies:

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

@Thomas, Jyothish (STFC,RAL,SC)

 

 

Further observations from Data Challenge

Future testing plans (RAL initiated / VO initiated) ?

All

DC24 observations

To improve still: LB, Checksums (inflight + metadata),

 

image-20240229-134100.png

RR with failsafe hotspotting

 

Rocky 8 migration planning

 

 

 

Deletion studies through RDR

@Ian Johnson

 

Previous deletion times reported were from the client side, end-to-end. Now analysing times for individual ceph_posix_unlink calls, and will look at deletion request times with XRootD itself. That is, looking at the amounts of time that a deletion request takes travelling though layers of XRootD code.

I’ve noted some strange outliers from an initial sample (y-axis times in ms, last tick should read 80000 ms.) Outlier for ceph_posix_unlink at 90s? stddev of 4476 for the plot below. 6808 samples taken for a single gateway from 22nd Feb:

atlas-unlink-times.png

 

Deletions

 

https://stfc.atlassian.net/browse/XRD-83

Next steps:
Collate previous information to define the problem

Identify potential solutions.

 

Planning for ALICE CMSD redirection

@Thomas, Jyothish (STFC,RAL,SC)

INC-163994 - DNS ip additions

 

Checksums fixes

@Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC)

Noscript checksum by Jo-stfc · Pull Request #9 · stfc/xrootd

 

Prefetch studies and WN changes

@Alexander Rogovskiy

 

 

Tokens Status

@Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis

 

 

CMSD Load balancing

@Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC)

PR:
revised load balancing algorithm - weighed random selection by Jo-stfc · Pull Request #8 · stfc/xrootd

addresses the issue where the current selByLoad algorithm leads to load hotspotting and coarse load distribution

It replaces the current algorithm in xrootd. testing for simulated behaviour in progress.

 

 

 

SKA Gateway box

@James Walder

https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180

 

5.6.x root TPC issue

 

root:// TPC transfer fail with xrootd 5.6.x · Issue #2202 · xrootd/xrootd

issue open, under investigation

 

 

on GGUS:

Site reports

Lancaster: As mentioned in storage, Dune root TPC hit by root:// TPC transfer fail with xrootd 5.6.x · Issue #2202 · xrootd/xrootd at Lancs.

Had one slow OSD cause a lot of issues for ~12 hours this week. Coincided with a period of heavy CEPH traffic, and caused a lot of CEPHFS mounts to be dropped.

Putting first new storage nodes in a long time into production next week, any tips?

  • Tom: better control of number of placement groups that could be moving.

Glasgow

 

 

 

 Action items

  • @James Walder to schedule a ‘hackathon’ within a F2F to have a session on architectural planning.

  • @James Walder to prepare an outline of the expected roadmap for XRootD developments in 2024.

  •  

 

 Decisions