• Rough draft
  • 2023-05-18 Meeting Notes

     Date

    May 18, 2023

     Participants

    • @James Walder

    • @Thomas, Jyothish (STFC,RAL,SC)

    • @Alexander Rogovskiy

    • Lancaster: Gerard, Matt, Steven

    Apologies:
    @Alison Packer , @Alastair Dewhurst

     Goals

    • List of Epics

    • New tickets

    • Consider new functionality / items

    • Detailed discussion of important topics

    • Site report activity

     

     Discussion topics

    Current status of Echo Gateways / WNs testing

    Recent sandbox’s for review / deployments:

     

    Item

    Presenter

    Notes

     

    Item

    Presenter

    Notes

     

    Impact of Vector Read update to Echo

     

    https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/137265343/Non-striper+read+v+implementation+for+WN+s+xrootd+gateways

    XRootD: Code review for ReadV implementation in XrdCeph (March 2023)

    https://github.com/stfc/xrootd-ceph/pull/37/files

    12–16 May 2023 Echo instability following readV rollout

    Timeline on Batch farm

    • 5th May: ` wn-2020-xma - wn-2022-lenovo` will be set to drain.

    • 9th May:

    • 10-11th May: Let the updated workers run for a few days.

    • 12th May: `wn-2017-dell (all 2017’s) - wn-2019-dell` will be set to drain.

    • 15th May: Repeat above process for second half of workers.

    • 16th May: Merge all required sandboxes into prod and manage farm back into `prod_batch` in AQ

     

     

     

    Next steps for WN deployment

     

    Possible options for short term WN status:

    • Currently configuration: 5.3.3-2(core) + 5.3.4-1 (xroot-ceph) for proxy and gateway

    1. Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered):

      1. Fixes the Xcache “Filename too long issue” (to be confirmed)!

      2. Provided buffering on ‘gateway’ for passed-through reads

      3. allows non-striper reads and readV requests (i.e. Alex updates) (also for passed through read(v)

      4. (b) and (c) are all configurable within the xrootd-xxx.cfg configuration files

      5. paged reads / (writes) would be enabled; probably only between Xcache and gateway (TBC)

      6. General fixes from 5.5.X series

      7. 5.5.4 currently being tested on lcg2268 (2017 dell, ml) (not exactly in this configuration however).

    2. 5.3.3-x (core) + 5.3.3-6 (xroot-ceph-buffered).

      1. Needs additional patch for “filename too long issue”;

        1. resulting in different (core) xrootd rpms for proxy and ceph (or a more detailed patch).

      2. We ‘understand’ 5.3.3 as a working and stable release

      3. Most testing on WNs done under this configuration

    3. (for the future) Make the proxy pass through all readV requests to the gateway

    4. (not for initial consideration) the proxy can be configured as disk-caching proxy (XCache) or to ‘forward / passthrough’ the requests to the gateway, without the need for draining the farm.

     

     

     

    EBUSY in readV requests

     

    Observation during Echo problem period -EBUSY requests from ceph, which are caught int the BufferedIO Read calls (5 attempts, then returns an -EIO error).
    We should ensure that readV requests also catch -EBUSY errors correctly, and not pass them back to core xrootd.
    @James Walder to create jira.

     

    Discussion on merging bufferedIO into master.
    Also to discuss pushing changes to “upstream” (xrootd/xroot-ceph”

     

    https://github.com/stfc/xrootd-ceph/pull/44

    Needs testing for ‘correctness’

    Also some discussion ongoing on xrootd “issues” on the xrootd-ceph sub-module:
    Remove xrootd-ceph git submodule and bring the code back to the main repository by amadio · Pull Request #2008 · xrootd/xrootd

     

    SEGV investigations with -S multi-stream flags

     

    https://stfc.atlassian.net/browse/XRD-53

    fix in ‘master’ of core xrootd; not yet added to a tagged xrootd release; to follow up with @Ian Johnson

     

    CMSD status

     

    https://stfc.atlassian.net/browse/XRD-41

    CC document

    https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/136446019/High-level+XrootD+redirection+for+Echo?focusedCommentId=136118425

    •  

     

    Transfers of 0-byte files

     

    https://stfc.atlassian.net/browse/XRD-62

    Observed Dune transfer failures using 0-byte files

     

     

     

     

     

    GGUS:

    Deletion problem at RAL

    Slow stat calls at RAL

    Problem accessing some LHCb files at RAL

    Site reports

     Action items

    • Create Jira for Checksumming updates for 3.7+ (especially for Rocky 9 releases).

    • Begin testing process on WN test node and aim to push to farm in timely manner

    • Continue investigations to readV methods that will enable the XCache to be removed (and therefore allow writable WNs)

     

     Decisions

    1. Gateway configuration following Option 1 is preferred: “Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered)”