2022-09-21 Meeting notes

 Date

Sep 21, 2022

 Participants

  • @James Walder

  • @Alastair Dewhurst

  • @Emmanuel Bejide

  • @Thomas Byrne

  • Manchester: Alessandra

  • Glasgow: Sam

  • Lancs: Steven

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

https://stfc.atlassian.net/jira/software/c/projects/XRD/boards/26/roadmap

Item

Presenter

Notes

 

Item

Presenter

Notes

 

5.5.0

 

Echo Gateways ~ 1/2 with 5.5.0 sandbox:

Awaiting approval of sandbox (Deployed).
WN XrootD containers then need to be addressed (centos7 / EL8?) (to discuss explicitly with @Thomas Birkett, and with automated builds )

 

‘unified’ config

 

Do we have a better name, than ‘unified’ ?
This will be configured to run on the webdav alias hosts.
root TPC transfers to be redirected to xrootd aliased hosts (or could just fail…)

 

Stats

 

 

gFal2 CLI slower than API

(API creates a context once, for each multi-processing thread).

 

CHEP abstracts

 

Anything planned for CHEP ? (17 Nov abstract deadline)

  • SE (Xrootd + Posix); and dev-lead ideas ?

 

Auto-restarter

 

Currently disabled; to review when Will returns (and a short post-mortem)

 

 

Possible New Time slot:

Time slot

Ok

Highly inconvenient

Time slot

Ok

Highly inconvenient

Monday 11-12

2

 

Wednesday 16-17

 

 

Thursday 13-14

1

 

Friday 14-15

 

 

 

 

Site reports

21 Sept 2022

Glasgow

Glasgow now on davs / xrootd 5.5.0. As mentioned to James, saw 20Gbit/s rates through the internal gateway before the xrootd service "livelocked" (up but apparently politely ignoring requests). Fixed by restart. We *do* see some packet discards on that link at the time we were at 20Gbit/s so it may be that the discards are associated with "ceph/rados ctx issues"->silent failures as noticed by Jyothish in general for xrdceph failures not being reported back up the xrootd chain.

 

Need to move to the redirector infrastructure now anyway [which I was holding off on whilst understanding the above] so that should also help reliability. 

Also doing some dev work on a fork of XrdCeph to add stream/on-the-fly checksums into the module [which hopefully would reduce the memory impact and io impact of checksums significantly].

 

(We also need to investigate the discards at a nic config level) 

 

Lancaster redirection balancing:

Bands indicate contribution of each server, stacked to 100% (left-hand scale).  Thick line (right-hand scale) is stddev of percentages, higher values indicating greater imbalance.  User+system time is as reported by xrootd, documented as generated from getrusage.  

 

 Action items

@James Walder prepare 5.5.1 RPMS

@James Walder Propose Thursday 13-14 for new meetings

 Decisions

  1. Thursday 13-14 for new meetings