2022-12-08 Meeting Notes

Date

Dec 8, 2022

Participants

@James Walder
Glasgow: Sam
Lancs: Steven, Matt, Gerard

Goals

List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity

Discussion topics

https://stfc.atlassian.net/jira/software/c/projects/XRD/boards/26/roadmap

Item	Presenter	Notes

Item	Presenter	Notes
Combining xrootd and webdav aliases gateways		Change made on Mon Nov 28; gw4,5 kept as single-use hosts Next step to slowly release/optimise the FTS limits for ATLAS (back to nominal levels).
Alice space token development updates		In progress
Slow stats		‘slow checksums’ Test of functional cephsum client / server code: Alex’s/LHCb code; stat + checksum; 100 files => ~ 100ms / file Current implementation is O(80)s for 100 files RTT (from lxplus ~ 20ms) `Timestamp, Execution time [s] 1670238676.49,14.4777889252,0 1670238709.95,12.2421181202,0 1670238722.25,11.8010079861,0 1670238734.12,11.6060519218,0 1670238745.79,11.357049942,0 1670238757.22,11.6894021034,0 1670238768.98,11.5300111771,0 1670238780.59,11.1914060116,0 1670238791.85,10.7543468475,0 1670238802.69,11.7126610279,0 1670238814.48,11.1290979385,0` https://github.com/snafus/cephsum-client https://github.com/snafus/cephsum-server Plan to continue to tests with this implementation.
Large checksums		CMS transferring (handful) of > 50GB files. FTS failed the transfers due to timeouts on the checksum side: RAL checksums @ 10s/GB; FTS nominally should have 1800s timeout, however, the HTTP client library was applying a 5min timeout. FTS devs managed to find a way to override that timeout, and transfers now succeeding. James - had observed difference in own tests between lxplus and RAL initiate checksum requests; Would like to confirm if same / different set of timeouts. Also discussed with XrootD devs; as FTS devs brought up (non-)ability to use 100 continue header Yes, XRootD does support "Expect: 100-continue" headers but this isued for a very limited purpose. When the http front-end is filling a buffer in the presence of read segmentation and the header was present, it will send a keepalive. Notice that this is not extended to checksum handling. However, it would be relatively easy to do this. However, we need to look at the best place for this to occur. It may be in the front end or it may be in the XRootD backend. In any case, could you cut a github ticket requesting that expect continue headers also apply to checksumming? Had (previously) already implemented ‘concurrent’ checksumming code, but never implemented (and still not sure if it’s ideal).
Deletes https://stfc.atlassian.net/browse/XRD-52		Alex notes still some cases of long deletes (e.g. beyond the 20s timeout). Also spotted case where macaroon was generated, but no further evidence of a connection from the client, leading to a timeout of the client side … https://indico.cern.ch/event/1217518/contributions/5121757/attachments/2562916/4417797/pres_liaisons.pdf
https://stfc.atlassian.net/browse/XRD-53 https://stfc.atlassian.net/browse/XRD-50		No time yet to properly investigate, but should be considered urgent.
Vector reads		https://indico.cern.ch/event/1217518/contributions/5121757/attachments/2562916/4417797/pres_liaisons.pdf Alex looked at: Reduced max number of segments per readv 'Buffered' reads Direct reads from ceph via librados ... (small snippet of code) + librados::AioCompletion* cmpl; + ceph::bufferlist* bl; + ReadOpData tup; + + cmpl = librados::Rados::aio_create_completion(); + if (0 == cmpl) { + logwrapper((char)"Can not create completion for read (%lu, %lu)", offset, size); + return -1; + } + + try { + bl = new ceph::bufferlist(); + } catch (std::bad_alloc&) { + logwrapper((char)"Can not allocate buffer for read (%lu, %lu)", offset, size); + cmpl->release(); + return -1; + } + + tup = std::make_tuple(cmpl, bl, out_buf); + operations.push_back(tup); + + return context->aio_read(fname, cmpl, bl, size, offset); + }; ...
tokens testbed		Lost dev-gw2 to Alice GW testing of space tokens.
Planning		Todo: Discuss / plan main roadmap for 2023