2022-12-15 Meeting Notes
Date
Dec 15, 2022
Participants
@James Walder
@Ian Johnson
Glasgow: Sam
Lancs: Steven, Matt, Gerard
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Major themes of 2022 → 2023:
(non exhaustive list … )
Bug fixing
New release testing
ReadV improvements (writeV ?)
Attempted removal of locking for read operations
buffering / circumvention (rewriting) of parts of libradosstriper
Read/Write (buffering) improvements
Added basic IO buffering
Additional functionality to be added
Deletion improvements
unified configuration allowed for concurrent deletes
need for offloading of deletes for asynchronous support ?
Stat improvements
Time for “entropy” in key creation hypothesised for slow root CLI stats
No actual xrootd improvements needed?
changed from st_dev =0
Susceptible to heavy-IO ceph workloads (e.g. adding in SNs)
Susceptible to ‘bad’ behaving Gateways
Checksum improvements
Checksum requests:
python script creates a unique rados client connection for each request => slow metadata retrievals
client/server model ~ 8x faster
Checksum calculation:
Done post-transfer, single read thread
Demonstrated ‘concurrent’ per-file checksumming
Work ongoing (Glasgow) on in-flight checksumming
OSD-based checksumming would remain an interesting topic (could deletions also be a use case?).
Spaceinfo:
allow for spaceinfo (used / quota) space ongoing
Token support:
Macaroons supported
WLCG profile tested and (mostly) passing
Still to be enabled in Prod (and for real VOs)
Configuration:
External gateways: replaced memory cache with internal buffering
Fixed namelib code in XrdCeph
Created Unified configuration
Added TPC xrootd (to allow for xrootd writes)
Combined the xrootd and webdav aliased hosts to (mostly) common set.
CMSD to be deployed
Testing on prod on-going
Need VM hosts
WNs:
Move to more updated version
‘fix’ readVs
Need the Xcache still ?
Other plans ?
Open Epics
Jira Issues not within an Epic
Item | Presenter | Notes |
|
---|---|---|---|
Combining xrootd and webdav aliases gateways |
| LHCb started ~O(500)TB Data challenge from cern EOS to Antares; Almost all data going via Echo. https used for each transfer link. “External view” of RAL incoming / outgoing connections Snapshot of “Best period” of eos->Echo transfers ~ 900 concurrent transfers @ ~ 3.26GB/s
“Best plot” with Echo to Antares; reached ~ 7.4GB/s. A lot more “bursty” set of transfers.
ATLAS; has ~ 130k FTS backlog of writes into Echo svc97
|
|
Alice space token development updates |
|
| |
Deletes https://stfc.atlassian.net/browse/XRD-52
|
|
|
|
| No time yet to properly investigate, but should be considered urgent.
|
| |
Vector reads |
|
|
|
tokens testbed |
| Lost dev-gw2 to Alice GW testing of space tokens.
|
|
|
|
|
|
Spaceinfo reporting Current requirements document:
Req | Priority | Title | Criteria | Implementation |
R1 | High | VO user view | Report VO user view of storage space allocated, used, and free in their VO pool | Support “xrdfs query space” command Convert from “raw” disk space to “VO user” disk space (Figures returned as key-value pairs) |
R2 | High | Ceph admin view | Allow Ceph admins to set the amount of raw disk space allocated to a VO pool | Provide instructions on how Ceph admin can assign the nominal space allocation to a VO pool |
R3 | Med/ Low | Restrict VO view | Ensure that VO users can only see details of their VO pool | Check user VO matches VO specified in disk space request |
R4 | Med | Another VO user view | Report VO user view of storage space allocated, used, and free in their VO pool with pretty printing | Interim: Provide Awk script to pretty-print the output of “xrdfs query space” Later: Support “xrdfs spaceinfo” |
R5 | Low | Accommodate different EC factors | Remove hard-coding of EC k:mi or r = k/(k+m) value used to convert from “raw” disk space to “VO user” disk space | Allow XRootD configuration file to contain k:m or r values per-cluster or per-pool |
R6 | Low | Allow raw disk allocation figure in arbitrary object | Alter location of per-pool object containing space allocation | Remove hard-coding of per-pool object name “__spaceinfo__” and extended attribute name “total_space” and place into XRootD configuration file Apply per-cluster or per-pool
|
GGUS:
Problem accessing some LHCb files at RAL
Site reports
Enabled the memory cache on the internal gateway (to attempt to mitigate simultaneous reads against the same file).
Action items
Decisions
- Next meeting January 12th 2023