• Rough draft
  • 2022-12-15 Meeting Notes

     Date

    Dec 15, 2022

     Participants

    • @James Walder

    • @Ian Johnson

    • Glasgow: Sam

    • Lancs: Steven, Matt, Gerard

     Goals

    • List of Epics

    • New tickets

    • Consider new functionality / items

    • Detailed discussion of important topics

    • Site report activity

     

     Discussion topics

    Major themes of 2022 → 2023:

    (non exhaustive list … )

    • Bug fixing

    • New release testing

    • ReadV improvements (writeV ?)

      • Attempted removal of locking for read operations

      • buffering / circumvention (rewriting) of parts of libradosstriper

    • Read/Write (buffering) improvements

      • Added basic IO buffering

      • Additional functionality to be added

    • Deletion improvements

      • unified configuration allowed for concurrent deletes

      • need for offloading of deletes for asynchronous support ?

    • Stat improvements

      • Time for “entropy” in key creation hypothesised for slow root CLI stats

      • No actual xrootd improvements needed?

      • changed from st_dev =0

      • Susceptible to heavy-IO ceph workloads (e.g. adding in SNs)

      • Susceptible to ‘bad’ behaving Gateways

    • Checksum improvements

      • Checksum requests:

        • python script creates a unique rados client connection for each request => slow metadata retrievals

          • client/server model ~ 8x faster

      • Checksum calculation:

        • Done post-transfer, single read thread

          • Demonstrated ‘concurrent’ per-file checksumming

        • Work ongoing (Glasgow) on in-flight checksumming

        • OSD-based checksumming would remain an interesting topic (could deletions also be a use case?).

    • Spaceinfo:

      • allow for spaceinfo (used / quota) space ongoing

    • Token support:

      • Macaroons supported

      • WLCG profile tested and (mostly) passing

        • Still to be enabled in Prod (and for real VOs)

    • Configuration:

      • External gateways: replaced memory cache with internal buffering

        • Fixed namelib code in XrdCeph

        • Created Unified configuration

          • Added TPC xrootd (to allow for xrootd writes)

      • Combined the xrootd and webdav aliased hosts to (mostly) common set.

      • CMSD to be deployed

        • Testing on prod on-going

        • Need VM hosts

      • WNs:

        • Move to more updated version

        • ‘fix’ readVs

        • Need the Xcache still ?

        • Other plans ?

     

    Open Epics

    key summary type created updated due assignee reporter priority status resolution
    Loading...
    Refresh

    Jira Issues not within an Epic

    key summary type created updated due assignee reporter priority status resolution
    Loading...
    Refresh

     

    https://stfc.atlassian.net/jira/software/c/projects/XRD/boards/26/roadmap

    Item

    Presenter

    Notes

     

    Item

    Presenter

    Notes

     

    Combining xrootd and webdav aliases gateways

     

    LHCb started ~O(500)TB Data challenge from cern EOS to Antares; Almost all data going via Echo. https used for each transfer link.

    “External view” of RAL incoming / outgoing connections

    Snapshot of “Best period” of eos->Echo transfers

    ~ 900 concurrent transfers @ ~ 3.26GB/s

     

    “Best plot” with Echo to Antares; reached ~ 7.4GB/s. A lot more “bursty” set of transfers.

     

    ATLAS; has ~ 130k FTS backlog of writes into Echo

    svc97

     

     

    Alice space token development updates

    https://stfc365-my.sharepoint.com/:w:/r/personal/ian_johnson_stfc_ac_uk/_layouts/15/Doc.aspx?sourcedoc=%7B132F553F-27B6-45AB-9020-41504A5F957B%7D&file=ECHO%20Disk%20Space%20Reporting%20Requirements.docx&wdLOR=cCFF7C51D-DE6D-D448-B9AC-07488D1976A8&action=default&mobileredirect=true

     

    Deletes

    https://stfc.atlassian.net/browse/XRD-52

     

     

     

     

    https://stfc.atlassian.net/browse/XRD-53

    https://stfc.atlassian.net/browse/XRD-50

     

    No time yet to properly investigate, but should be considered urgent.

     

     

    Vector reads

     

     

     

    tokens testbed

     

    Lost dev-gw2 to Alice GW testing of space tokens.

     

     

     

     

     

     

     

     

    Spaceinfo reporting Current requirements document:

    Req 

    Priority 

    Title 

    Criteria 

    Implementation 

    R1 

    High 

    VO user view 

    Report VO user view of storage space allocated, used, and free in their VO pool 

    Support “xrdfs query space” command 

    Convert from “raw” disk space to “VO user” disk space  

    (Figures returned as key-value pairs) 

    R2 

    High 

    Ceph admin view 

    Allow Ceph admins to set the amount of raw disk space allocated to a VO pool 

    Provide instructions on how Ceph admin can assign the nominal space allocation to a VO pool 

    R3 

    Med/ 

    Low 

    Restrict VO view 

    Ensure that VO users can only see details of their VO pool 

    Check user VO matches VO specified in disk space request 

    R4 

    Med 

    Another VO  

    user view 

    Report VO user view of storage space allocated, used, and free in their VO pool with pretty printing 

    Interim: Provide Awk script to pretty-print the output of “xrdfs query space” 

    Later: Support “xrdfs spaceinfo” 

    R5 

    Low 

    Accommodate different EC factors 

    Remove hard-coding of EC k:mi or r = k/(k+m)  value used to convert from “raw” disk space to “VO user” disk space 

    Allow XRootD configuration file to contain k:m or r values per-cluster or per-pool 

    R6 

    Low 

    Allow raw disk allocation figure in arbitrary object  

    Alter location of per-pool object containing space allocation 

    Remove hard-coding of per-pool object name “__spaceinfo__” 

     and extended attribute name “total_space” and place into XRootD configuration file 

    Apply per-cluster or per-pool 

     

     

    GGUS:

    Deletion problem at RAL

    Slow stat calls at RAL

    Problem accessing some LHCb files at RAL

    Site reports

    • Enabled the memory cache on the internal gateway (to attempt to mitigate simultaneous reads against the same file).

     Action items

     

     Decisions

    1. Next meeting January 12th 2023