2024-04-25 Meeting Notes
Date
Apr 25, 2024
Participants
@Thomas, Jyothish (STFC,RAL,SC)
@Ian Johnson
@Alastair Dewhurst
@Mariam Demir
@James Walder
Lancs: Gerard, Steven, Matt
Glasgow: Sam
Apologies:
CC:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Operational Issues | @Thomas, Jyothish (STFC,RAL,SC) | 2024-04-23 LHCb WGprod Echo overload more mem on the Xcache - talk with Tom Birkett theoretical limits of the cluster: ~500BG/s , ~100k iops actual iops are bloated by ceph (rocksdb, EC write/read) SSD use for storage survey- check wear level/lifetime of SSD dell-19s are SSDs used in batch farm spinning disks have lower iops than ssd, Xcache buffering is vital for future |
|
XrootD Workshop plan | @Alastair Dewhurst | TBA, registration payment page feedback welcome |
|
Rocky 8 and 9 migration planning |
| gridftp to be decommissioned (CMS notified) , 4 gateways to be introduced in prod |
|
Future developments ideas planning work | @Ian Johnson @Thomas, Jyothish (STFC,RAL,SC) |
| |
Deletion studies through RDR | @Ian Johnson
| Tidying awk/sqlite scripts to process logfile data, e.g. from DC24. |
|
Deletions | Looking into “rados bencher” clean_up routine which fires off several async rm calls, comparing this with XrdCeph unlink which calls striper::remove - it waits for the async rm to complete, hence blocking may be the cause of RAL’s insufficient deletion rates is the deletion fully parallel? 100TB castor migration dataset available for deletion |
| |
Planning for ALICE CMSD redirection | @Thomas, Jyothish (STFC,RAL,SC) |
|
|
Checksums fixes | @Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC) | Done. Test in the batch farm? |
|
Prefetch studies and WN changes | @Alexander Rogovskiy |
|
|
Tokens Status | @Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis |
|
|
CMSD Load balancing | @Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC) |
|
|
SKA Gateway box | @James Walder | https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180 4 Nodes awaiting installation: 2 for Exit pod (+ 1 existing) |
|
|
|
|
|
Xrootd testing framework | @Mariam Demir |
|
|
on GGUS:
Site reports
Lancaster: CEPH - seems fairly happy at the moment, Gerard investigating some issues with scrubbing. XRootD - having a set of LSST functional tests fail with permission denied whilst making their directories. The xroot logs are worse then useless for debugging auth issues. There shouldn’t be any access issues, and access works for me and Tim. However it reminded me of issues Glasgow had with LHCB (mkdir over http getting permission denied). Any tips for teasing out more information on xroot auth decisions?
Jyothish - LHCB functional tests needed additional entries in the authdb to stat root folders. e.g. lhcb:user rl additional to lhcb:user/ a
Similarly is there any good way of logging deletions server side?
Logging needs improvement across xrootd - one event one line, machine parsable, reduce clutter, uniform format. possibly bring up in the workshop?
Glasgow: Sam build newer versions of xrootd on CC7 and 8. (anticipating 5.6.9), and awaiting any updates for LB.
GitHub - stfc/xrootd-ceph at variableobjectcleanup
Action items
@James Walder to schedule a ‘hackathon’ within a F2F to have a session on architectural planning.
@James Walder to prepare an outline of the expected roadmap for XRootD developments in 2024.