2024-02-29 Meeting Notes
Date
Feb 29, 2024
Participants
@James Walder
@Thomas, Jyothish (STFC,RAL,SC)
@Katy Ellis
@Alexander Rogovskiy
@Thomas Byrne
Lancs: Matt, Gerard
Glasgow:
Apologies:
CC:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Operational Issues | @Thomas, Jyothish (STFC,RAL,SC) |
|
|
Further observations from Data Challenge Future testing plans (RAL initiated / VO initiated) ? | All | To improve still: LB, Checksums (inflight + metadata),
RR with failsafe hotspotting |
|
Rocky 8 migration planning |
|
|
|
Deletion studies through RDR | @Ian Johnson
| Previous deletion times reported were from the client side, end-to-end. Now analysing times for individual ceph_posix_unlink calls, and will look at deletion request times with XRootD itself. That is, looking at the amounts of time that a deletion request takes travelling though layers of XRootD code. I’ve noted some strange outliers from an initial sample (y-axis times in ms, last tick should read 80000 ms.) Outlier for ceph_posix_unlink at 90s? stddev of 4476 for the plot below. 6808 samples taken for a single gateway from 22nd Feb: |
|
Deletions |
| https://stfc.atlassian.net/browse/XRD-83 Next steps: Identify potential solutions. |
|
Planning for ALICE CMSD redirection | @Thomas, Jyothish (STFC,RAL,SC) | INC-163994 - DNS ip additions |
|
Checksums fixes | @Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC) | Noscript checksum by Jo-stfc · Pull Request #9 · stfc/xrootd |
|
Prefetch studies and WN changes | @Alexander Rogovskiy |
|
|
Tokens Status | @Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis |
|
|
CMSD Load balancing | @Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC) | PR: It replaces the current algorithm in xrootd. testing for simulated behaviour in progress.
|
|
SKA Gateway box | @James Walder |
| |
5.6.x root TPC issue |
| root:// TPC transfer fail with xrootd 5.6.x · Issue #2202 · xrootd/xrootd issue open, under investigation |
|
on GGUS:
Site reports
Lancaster: As mentioned in storage, Dune root TPC hit by root:// TPC transfer fail with xrootd 5.6.x · Issue #2202 · xrootd/xrootd at Lancs.
Had one slow OSD cause a lot of issues for ~12 hours this week. Coincided with a period of heavy CEPH traffic, and caused a lot of CEPHFS mounts to be dropped.
Putting first new storage nodes in a long time into production next week, any tips?
Tom: better control of number of placement groups that could be moving.
Glasgow
Action items
@James Walder to schedule a ‘hackathon’ within a F2F to have a session on architectural planning.
@James Walder to prepare an outline of the expected roadmap for XRootD developments in 2024.