2022-09-21 Meeting notes
Date
Sep 21, 2022
Participants
@James Walder
@Alastair Dewhurst
@Emmanuel Bejide
@Thomas Byrne
Manchester: Alessandra
Glasgow: Sam
Lancs: Steven
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Item | Presenter | Notes |
|
---|---|---|---|
5.5.0 |
| Echo Gateways ~ 1/2 with 5.5.0 sandbox: Awaiting approval of sandbox (Deployed). |
|
‘unified’ config |
| Do we have a better name, than ‘unified’ ? |
|
Stats
|
| gFal2 CLI slower than API (API creates a context once, for each multi-processing thread). |
|
CHEP abstracts |
| Anything planned for CHEP ? (17 Nov abstract deadline)
|
|
Auto-restarter |
| Currently disabled; to review when Will returns (and a short post-mortem) |
|
Possible New Time slot:
Time slot | Ok | Highly inconvenient |
---|---|---|
Monday 11-12 | 2 |
|
Wednesday 16-17 |
|
|
Thursday 13-14 | 1 |
|
Friday 14-15 |
|
|
Site reports
21 Sept 2022
Glasgow
Glasgow now on davs / xrootd 5.5.0. As mentioned to James, saw 20Gbit/s rates through the internal gateway before the xrootd service "livelocked" (up but apparently politely ignoring requests). Fixed by restart. We *do* see some packet discards on that link at the time we were at 20Gbit/s so it may be that the discards are associated with "ceph/rados ctx issues"->silent failures as noticed by Jyothish in general for xrdceph failures not being reported back up the xrootd chain.
Need to move to the redirector infrastructure now anyway [which I was holding off on whilst understanding the above] so that should also help reliability.
Also doing some dev work on a fork of XrdCeph to add stream/on-the-fly checksums into the module [which hopefully would reduce the memory impact and io impact of checksums significantly].
(We also need to investigate the discards at a nic config level)
Lancaster redirection balancing:
Bands indicate contribution of each server, stacked to 100% (left-hand scale). Thick line (right-hand scale) is stddev of percentages, higher values indicating greater imbalance. User+system time is as reported by xrootd, documented as generated from getrusage.
Action items
@James Walder prepare 5.5.1 RPMS
@James Walder Propose Thursday 13-14 for new meetings
Decisions
- Thursday 13-14 for new meetings