2022-12-01 Meeting Notes
Date
Oct 27, 2022
Participants
@James Walder
@Ian Johnson
Glasgow: Sam
Lancs: Steven, Matt, Gerard
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Item | Presenter | Notes |
|
---|---|---|---|
Combining xrootd and webdav aliases gateways |
| Change made on Monday; gw4,5 kept as single-use hosts Next step to slowly release the FTS limits for ATLAS
|
|
Alice space token development updates | @Ian Johnson | Code review found the need for better error reporting when the figure for the amount of disk space allocated to a pool cannot be retrieved from the extended attribute on the object “pool:__spaceinfo__” in target pools. This is now in place. The StatLS code also needs to convert from the “raw” figures for disk space allocation and usage to VO-relevant figures by accounting for the erasure coding overhead. Assuming this is approx 8/11 for the ECHO and DEV clusters for now, but will allow a config file setting for other cluster layouts. Costin from ALICE had a look at the output of ‘xrdfs query space’ two weeks ago and appears keen to assist with the trials, e.g. when the functionality is moved to ceph-dev-gw2 (this is an ALICE “testbed” for some purposes). |
|
Slow stats |
| As previously mentioned; the ‘slow stats’ issue from LHCb appears to be more related to slow checksums Plot showing the time for 100 files to run through the ‘lhcb’ stat (+checksum) code, comparing SARA and RAL James; constructed a Proof-of-concept client-server checksum (metadata) tooling. Server: Client: On ceph-dev instance; code runs in ~ 15ms; and < 1s for 100 checksum retrievals: Todo:
|
|
Deletes https://stfc.atlassian.net/browse/XRD-52
|
| Alex notes still some cases of long deletes (e.g. beyond the 20s timeout). Added additional macaroon logging to the prod hosts: https://elog.gridpp.rl.ac.uk/Tier1/10679 Spotted one case where macaroon was generated, but no further evidence of a connection from the client, leading to a timeout of the client side … |
|
| No time yet to properly investigate, but should be considered urgent.
|
| |
Vector reads |
| Alex suggestion of restricting the client’s ability (from the server side) to send large numbers of readv segments in a request, appears to work, but without some ‘(X)caching, performance is slow’. Tested with adding a Buffer behind the readV requests (e.g. readV reads from a buffered read of data from ceph). This is still ‘just’ a mitigation of the any underlying code improvements that could be made. |
|
tokens testbed |
| Tokens testbed; updated tests mean that RAL no longer passes all tests. Some are likely to need xrootD updates; others might be due to the tests doing Directory-level stuff. And some might need configuration updates. Intend to try and add SKA issuer for some functional tests. |
|
GGUS:
Site reports
Action items
@Thomas Byrne to be made aware of the big red warning on https://docs.ceph.com/en/quincy/cephadm/upgrade/ (but is hopefully fixed now).