2023-05-18 Meeting Notes

\uD83D\uDDD3 Date

18 May 2023

\uD83D\uDC65 Participants

\uD83E\uDD45 Goals

List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity

\uD83D\uDDE3 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

Item	Presenter	Notes
Impact of Vector Read update to Echo		https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/137265343/Non-striper+read+v+implementation+for+WN+s+xrootd+gateways https://stfc.atlassian.net/wiki/spaces/X/pages/edit-v2/143786029 https://github.com/stfc/xrootd-ceph/pull/37/files 12–16 May 2023 Echo instability following readV rollout Timeline on Batch farm 5th May: ` wn-2020-xma - wn-2022-lenovo` will be set to drain. 9th May: Merge the sandbox http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=update_cvmfs_client into production. Drained workers will have the package `egi-cvmfs` manually removed. Manually remove the `xrootd-proxy.service` (/etc/systemd/system/xrootd-proxy.service) Manage workers into ~~http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=xrootd-patch~~ ~~(will include cherry-picked commit for the Docker patch~~). http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=docker-cvmfs-xrootd-combo This sandbox combines all changes and gives a clear view of all changes (DO NOT DEPLOY). Confirm workers have successfully compiled: Check: `healthcheck_wn_condor` outputs healthy status. Check ` xrootd-gateway.service` outputs healthy Docker status. Bring online tranche at a time confirming with data team xrootd is working as expected. 10-11th May: Let the updated workers run for a few days. 12th May: `wn-2017-dell (all 2017’s) - wn-2019-dell` will be set to drain. 15th May: Repeat above process for second half of workers. 16th May: Merge all required sandboxes into prod and manage farm back into `prod_batch` in AQ
Next steps for WN deployment		Possible options for short term WN status: Currently configuration: 5.3.3-2(core) + 5.3.4-1 (xroot-ceph) for proxy and gateway Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered): Fixes the Xcache “Filename too long issue” (to be confirmed)! Provided buffering on ‘gateway’ for passed-through reads allows non-striper reads and readV requests (i.e. Alex updates) (also for passed through read(v) (b) and (c) are all configurable within the xrootd-xxx.cfg configuration files paged reads / (writes) would be enabled; probably only between Xcache and gateway (TBC) General fixes from 5.5.X series 5.5.4 currently being tested on lcg2268 (2017 dell, ml) (not exactly in this configuration however). 5.3.3-x (core) + 5.3.3-6 (xroot-ceph-buffered). Needs additional patch for “filename too long issue”; resulting in different (core) xrootd rpms for proxy and ceph (or a more detailed patch). We ‘understand’ 5.3.3 as a working and stable release Most testing on WNs done under this configuration (not for initial consideration) the proxy can be configured as disk-caching proxy (XCache) or to ‘forward / passthrough’ the requests to the gateway, without the need for draining the farm.
EBUSY in readV requests		Observation during Echo problem period -EBUSY requests from ceph, which are caught int the BufferedIO Read calls (5 attempts, then returns an -EIO error). We should ensure that readV requests also catch -EBUSY errors correctly, and not pass them back to core xrootd. James Walder to create jira.
Discussion on merging bufferedIO into master. Also to discuss pushing changes to “upstream” (xrootd/xroot-ceph”		https://github.com/stfc/xrootd-ceph/pull/44 Needs testing for ‘correctness’ Also some discussion ongoing on xrootd “issues” on the xrootd-ceph sub-module: https://github.com/xrootd/xrootd/pull/2008
SEGV investigations with -S multi-stream flags		XRD-53 - Getting issue details... STATUS fix in ‘master’ of core xrootd; not yet added to a tagged xrootd release; to follow up with Ian Johnson
CMSD status		XRD-41 - Getting issue details... STATUS CC document https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/136446019/High-level+XrootD+redirection+for+Echo?focusedCommentId=136118425
Transfers of 0-byte files		XRD-62 - Getting issue details... STATUS Observed Dune transfer failures using 0-byte files

GGUS:

Deletion problem at RAL

Slow stat calls at RAL

Problem accessing some LHCb files at RAL

Site reports

✅ Action items

Create Jira for Checksumming updates for 3.7+ (especially for Rocky 9 releases).
James Walder To review and approve the PR for the vector read work
James Walder To identify and discuss with Dune representatives the 0-byte file failures, and whether this is an issue / understood

2023-05-18 Meeting Notes

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

\uD83E\uDD45 Goals

\uD83D\uDDE3 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

Timeline on Batch farm

Possible options for short term WN status:

GGUS:

Site reports

✅ Action items

⤴ Decisions