2022-12-08 Meeting Notes
Date
Dec 8, 2022
Participants
@James Walder
Glasgow: Sam
Lancs: Steven, Matt, Gerard
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Item | Presenter | Notes |
|
---|---|---|---|
Combining xrootd and webdav aliases gateways |
| Change made on Mon Nov 28; gw4,5 kept as single-use hosts
Next step to slowly release/optimise the FTS limits for ATLAS (back to nominal levels).
|
|
Alice space token development updates |
| In progress |
|
Slow stats |
| ‘slow checksums’ Test of functional cephsum client / server code: Alex’s/LHCb code; stat + checksum; 100 files => ~ 100ms / file RTT (from lxplus ~ 20ms) Timestamp, Execution time [s]
1670238676.49,14.4777889252,0
1670238709.95,12.2421181202,0
1670238722.25,11.8010079861,0
1670238734.12,11.6060519218,0
1670238745.79,11.357049942,0
1670238757.22,11.6894021034,0
1670238768.98,11.5300111771,0
1670238780.59,11.1914060116,0
1670238791.85,10.7543468475,0
1670238802.69,11.7126610279,0
1670238814.48,11.1290979385,0 GitHub - snafus/cephsum-client: Cephsum client in the cephsum client-server model GitHub - snafus/cephsum-server: Server component of the cephsum client-server implementation Plan to continue to tests with this implementation. |
|
Large checksums |
| CMS transferring (handful) of > 50GB files. FTS devs managed to find a way to override that timeout, and transfers now succeeding. James - had observed difference in own tests between lxplus and RAL initiate checksum requests; Would like to confirm if same / different set of timeouts. Also discussed with XrootD devs; as FTS devs brought up (non-)ability to use 100 continue header
Had (previously) already implemented ‘concurrent’ checksumming code, but never implemented (and still not sure if it’s ideal).
|
|
Deletes https://stfc.atlassian.net/browse/XRD-52
|
| Alex notes still some cases of long deletes (e.g. beyond the 20s timeout). Also spotted case where macaroon was generated, but no further evidence of a connection from the client, leading to a timeout of the client side …
|
|
| No time yet to properly investigate, but should be considered urgent.
|
| |
Vector reads |
| Alex looked at:
... (small snippet of code)
+ librados::AioCompletion* cmpl;
+ ceph::bufferlist* bl;
+ ReadOpData tup;
+
+ cmpl = librados::Rados::aio_create_completion();
+ if (0 == cmpl) {
+ logwrapper((char*)"Can not create completion for read (%lu, %lu)", offset, size);
+ return -1;
+ }
+
+ try {
+ bl = new ceph::bufferlist();
+ } catch (std::bad_alloc&) {
+ logwrapper((char*)"Can not allocate buffer for read (%lu, %lu)", offset, size);
+ cmpl->release();
+ return -1;
+ }
+
+ tup = std::make_tuple(cmpl, bl, out_buf);
+ operations.push_back(tup);
+
+ return context->aio_read(fname, cmpl, bl, size, offset);
+ };
...
|
|
tokens testbed |
| Lost dev-gw2 to Alice GW testing of space tokens.
|
|
Planning |
| Todo: Discuss / plan main roadmap for 2023 |
|
GGUS:
Problem accessing some LHCb files at RAL
Site reports
Action items
JW to add issue to xrootd GitHub to request Expect: 100 continue functionality for XrootD checksumming.