• Rough draft
  • 2022-11-3 Meeting Notes

     Date

    Nov 3, 2022

     Participants

    • @James Walder

    • @Emmanuel Bejide

    • Lancs: Apologies (Gerard, Matt), Steven

     Goals

    • List of Epics

    • New tickets

    • Consider new functionality / items

    • Detailed discussion of important topics

    • Site report activity

     

     Discussion topics

    https://stfc.atlassian.net/jira/software/c/projects/XRD/boards/26/roadmap

    Item

    Presenter

    Notes

     

    Item

    Presenter

    Notes

     

    5.5.1 released

     

    Reports of significant issues (yet to check if these are on GitHub as issues)

     

    Xcache 5.5.X problems

     

    Stuck xroot transfers with gfal2 and XCache · Issue #1808 · xrootd/xrootd No update so far

     

    Thoughts on combining the xrootd and webdav aliased hosts ?

     

    Interest to try changing the DNS alias (next week?) to use common sets of host for xrootd and webdav?

     

    unified Sandox

     

    Initial review with Tom; some cleanup and a couple of missed config settings (due to the addition of the tpc instance to be added).

     

    CMSD

     

    Trying to find someone to talk to about RAL VMware

     

    Slow stats

     

    Ongoing study;

    Separate out “unhappy” gateway scenario other items ?
    With Alex, would like to confirm if any pool-dependent differences

     

    Vector Read requests on Echo Gateways

     

    Discussing with Ian and Alex.

    Alex able to reproduce with Rob’s scripts; some testing ideas.

    • I’m preparing updated simple range coalescence configuration to test against

       

     

    Slow deletes

    https://ggus.eu/index.php?mode=ticket_info&ticket_id=159395

     

     

    Here’s my timeline for the file: /lhcb/MC/2017/SIM/00170176/0001/00170176_00011436_1.sim

    (on the webdav aliased hosts)

     

    Svc02: Initial Write

    221101 03:56:38 File descriptor 133666 associated to file /lhcb:buffer/lhcb/MC/2017/SIM/00170176/0001/00170176_00011436_1.sim opened in write mode

    221101 03:57:02 ceph_close: closed fd 133666 for file buffer/lhcb/MC/2017/SIM/00170176/0001/00170176_00011436_1.sim, read ops count 0, write ops count 30, async write ops 0/0, async pending write bytes 0, async read ops 0/0, bytes written/max offset 501311973/501311972, longest async write 0.000000, longest callback invocation 0.000000, last async op age 0.000000

     

    svc01: Checksum
    CEPHSUM-2022-11-01 03:57:08,906-1593397-INFO-Result:Done, pool:lhcb, path:/lhcb:buffer/lhcb/MC/2017/SIM/00170176/0001/00170176_00011436_1.sim, checksum:ae4f33c4, time_s:5.067184,  filesize_bytes:501311973, source:file, exit_code:0, srccks:N/A

     

    svc99: Unlink

    221101 05:05:37 ceph_stat: /lhcb:buffer/lhcb/MC/2017/SIM/00170176/0001/00170176_00011436_1.sim

    221101 05:05:37 ceph_posix_unlink : /lhcb:buffer/lhcb/MC/2017/SIM/00170176/0001/00170176_00011436_1.sim

     

    https://stfc.atlassian.net/browse/XRD-52

     

     

    GGUS:

     

     

    Site reports

    Glasgow

    Sam notes that “unhappy” OSDs might be the cause of stalled operations; restarting xrootd ‘fixes’ things
    * How to make a dev OSD unhappy
    * How to manage this in ceph.conf, or XrdCeph.

     

    Lancaster:

     

    ECDF:

    Manchester:

     Action items

    JW to create ticket to Storage teams for VMware based CMSD testing
    Consider combining xrootd and webdav aliased hosts once Sandbox is deployed to prod.
    Preparation of abstract for CHEP for xrootd related would should indeed happen.
    Location of Tom’s Xcache/vs memcache https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/XRootDVectorReadTestProgram
    Plan how to used ceph dev to test scenarios of cases where OSD is problematic, but not yet marked as out of the cluster; vary the tuning parameters and characterise performance.

     

     

     Decisions