• Rough draft
  • 2023-04-27 Meeting Notes

     Date

    Apr 27, 2023

     Participants

    • @James Walder

    • @Ian Johnson

    • @Alexander Rogovskiy

    • @Thomas, Jyothish (STFC,RAL,SC)

    • Lancs: Matt. Steven

    • Man: Alessandra

    • Glasgow: Sam

     Goals

    • List of Epics

    • New tickets

    • Consider new functionality / items

    • Detailed discussion of important topics

    • Site report activity

     

     Discussion topics

    Current status of Echo Gateways / WNs testing

    Recent sandbox’s for review / deployments:

     

    Item

    Presenter

    Notes

     

    Item

    Presenter

    Notes

     

    Vector Read

     

    https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/137265343/Non-striper+read+v+implementation+for+WN+s+xrootd+gateways

    XRootD: Code review for ReadV implementation in XrdCeph (March 2023)

    feat: improve (vector) read implementation by alex-rg · Pull Request #37 · stfc/xrootd-ceph

    Code review largely complete; no additional meetings expected.
    - Would like to resolve any residual comments, asap.

    Planned actions:

    • Complete the Code review, to allow:

      • Merge PR into master branch (and merge into bufferedIO)

    • build rpms for 5.3.3 and 5.5.4 releases

      • Need to ensure 5.3.3. core XRootD keeps the DH key length patch

      • Request client side timeout ENV in docker job containers

    • (supply Glasgow with appropriate tag / commits in GitHub to build their RPMS)

    • Arrange testing on Echo prod, AAA and Alice gateways:

      • AAA testing with various buffer sizes would be interesting.

     

    SEGV investigations with -S multi-stream flags

     

    https://stfc.atlassian.net/browse/XRD-53

    The SEGV has not occured when using the Ceph plugin with XRootD v5.5.4.post257 (local compile). The reason for this is unknown; we have raised a query about this in segfault from XrdXrootdProtocol::do_OffloadIO · Issue #1821 · xrootd/xrootd

    Continuing to search for the code change(s) now allowing multiple streams to work correctly. Issue in Jira updated with table of behaviour with different releases of XRootD server.

     

    Fix for paged writes when misaligned to end of buffer

     

    Bug fix for writes with bufferedIO when extending over buffer range. by snafus · Pull Request #40 · stfc/xrootd-ceph

    Sandbox on gw7; ready to be deployed? Aim for Tuesday rollout to the Gateways.

     

    CMSD status

     

    https://stfc.atlassian.net/browse/XRD-41

    CC document

    https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/136446019/High-level+XrootD+redirection+for+Echo?focusedCommentId=136118425

    • Observed quick round-robin behaviour for ipv4; slow (according to TTL) for ipv6. James A. states this is fine; and different clients should get directed to different managers anyway.

    • Failover behaviour appears to work (when manually stopping / xrootd or cmsd ). Best command for spotting a broken xrootd service ?

    • AQ configuration exists, but should be refactored to add relevant “service” level functionality.

    • ATLAS FTs running against it, and run some simple ‘stress’ test transfers; looking ok so far.

    Todo;

    • complete the AQ setup

    • Deploy to all Echo gateways (still in the largely ‘passive’ mode).

    • Define and agree an agenda / schedule, for moving VOs (and their various activities over).

      • Make final switch of webdav and xrootd aliases to the redirector address

      • This will need new certs for the two manager hosts

     

    Transfers of 0-byte files

     

    https://stfc.atlassian.net/browse/XRD-62

    Observed Dune transfer failures using 0-byte files

     

     

     

     

     

    GGUS:

    Deletion problem at RAL

    Slow stat calls at RAL

    Problem accessing some LHCb files at RAL

    Site reports

     Action items

    • Create Jira for Checksumming updates for 3.7+ (especially for Rocky 9 releases).

    • @James Walder To review and approve the PR for the vector read work

    • @James Walder To identify and discuss with Dune representatives the 0-byte file failures, and whether this is an issue / understood

     

     Decisions