Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apologies:
Alison Packer , Alastair Dewhurst

\uD83E\uDD45 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

...

Item

Presenter

Notes

Impact of Vector Read update to Echo

Vector Read

https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/137265343/Non-striper+read+v+implementation+for+WN+s+xrootd+gateways

https://stfc.atlassian.net/wiki/spaces/X/pages/edit-v2/143786029

https://github.com/stfc/xrootd-ceph/pull/37/files

12–16 May 2023 Echo instability following readV rollout

Timeline on Batch farm

  • 5th May: ` wn-2020-xma - wn-2022-lenovo` will be set to drain.

  • 9th May:

  • 10-11th May: Let the updated workers run for a few days.

  • 12th May: `wn-2017-dell (all 2017’s) - wn-2019-dell` will be set to drain.

  • 15th May: Repeat above process for second half of workers.

  • 16th May: Merge all required sandboxes into prod and manage farm back into `prod_batch` in AQ

Next steps for WN deployment

Possible options for short term WN status:

  • Currently configuration: 5.3.3-2(core) + 5.3.4-1 (xroot-ceph) for proxy and gateway

  1. Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered):

    1. Fixes the Xcache “Filename too long issue” (to be confirmed)!

    2. Provided buffering on ‘gateway’ for passed-through reads

    3. allows non-striper reads and readV requests (i.e. Alex updates) (also for passed through read(v)

    4. (b) and (c) are all configurable within the xrootd-xxx.cfg configuration files

    5. paged reads / (writes) would be enabled; probably only between Xcache and gateway (TBC)

    6. General fixes from 5.5.X series

    7. 5.5.4 currently being tested on lcg2268 (2017 dell, ml) (not exactly in this configuration however).

  2. 5.3.3-x (core) + 5.3.3-6 (xroot-ceph-buffered).

    1. Needs additional patch for “filename too long issue”;

      1. resulting in different (core) xrootd rpms for proxy and ceph (or a more detailed patch).

    2. We ‘understand’ 5.3.3 as a working and stable release

    3. Most testing on WNs done under this configuration

  3. (for the future) Make the proxy pass through all readV requests to the gateway

  4. (not for initial consideration) the proxy can be configured as disk-caching proxy (XCache) or to ‘forward / passthrough’ the requests to the gateway, without the need for draining the farm.

EBUSY in readV requests

Observation during Echo problem period -EBUSY requests from ceph, which are caught int the BufferedIO Read calls (5 attempts, then returns an -EIO error).
We should ensure that readV requests also catch -EBUSY errors correctly, and not pass them back to core xrootd.
James Walder to create jira.

Discussion on merging bufferedIO into master.
Also to discuss pushing changes to “upstream” (xrootd/xroot-ceph”

https://github.com/stfc/xrootd-ceph/pull/44

Needs testing for ‘correctness’

Also some discussion ongoing on xrootd “issues” on the xrootd-ceph sub-module:
https://github.com/xrootd/xrootd/pull/2008

SEGV investigations with -S multi-stream flags

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-53

fix in ‘master’ of core xrootd; not yet added to a tagged xrootd release; to follow up with Ian Johnson

CMSD status

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-41

CC document

https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/136446019/High-level+XrootD+redirection+for+Echo?focusedCommentId=136118425

Transfers of 0-byte files

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-62

Observed Dune transfer failures using 0-byte files

...

  • Create Jira for Checksumming updates for 3.7+ (especially for Rocky 9 releases).

  • James Walder To review and approve the PR for the vector read work

  • James Walder To identify and discuss with Dune representatives the 0-byte file failures, and whether this is an issue / understood

...

  • Begin testing process on WN test node and aim to push to farm in timely manner

  • Continue investigations to readV methods that will enable the XCache to be removed (and therefore allow writable WNs)

⤴ Decisions

  • Gateway configuration following Option 1 is preferred: “Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered)”