Bring online tranche at a time confirming with data team xrootd is working as expected.
10-11th May: Let the updated workers run for a few days.
12th May: `wn-2017-dell (all 2017’s) - wn-2019-dell` will be set to drain.
15th May: Repeat above process for second half of workers.
16th May: Merge all required sandboxes into prod and manage farm back into `prod_batch` in AQ
Next steps for WN deployment
Possible options for short term WN status:
Currently configuration: 5.3.3-2(core) + 5.3.4-1 (xroot-ceph) for proxy and gateway
Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered):
Fixes the Xcache “Filename too long issue” (to be confirmed)!
Provided buffering on ‘gateway’ for passed-through reads
allows non-striper reads and readV requests (i.e. Alex updates) (also for passed through read(v)
(b) and (c) are all configurable within the xrootd-xxx.cfg configuration files
paged reads / (writes) would be enabled; probably only between Xcache and gateway (TBC)
General fixes from 5.5.X series
5.5.4 currently being tested on lcg2268 (2017 dell, ml) (not exactly in this configuration however).
5.3.3-x (core) + 5.3.3-6 (xroot-ceph-buffered).
Needs additional patch for “filename too long issue”;
resulting in different (core) xrootd rpms for proxy and ceph (or a more detailed patch).
We ‘understand’ 5.3.3 as a working and stable release
Most testing on WNs done under this configuration
(for the future) Make the proxy pass through all readV requests to the gateway
…
(not for initial consideration) the proxy can be configured as disk-caching proxy (XCache) or to ‘forward / passthrough’ the requests to the gateway, without the need for draining the farm.
EBUSY in readV requests
Observation during Echo problem period -EBUSY requests from ceph, which are caught int the BufferedIO Read calls (5 attempts, then returns an -EIO error). We should ensure that readV requests also catch -EBUSY errors correctly, and not pass them back to core xrootd. James Walder to create jira.
Discussion on merging bufferedIO into master. Also to discuss pushing changes to “upstream” (xrootd/xroot-ceph”
Create Jira for Checksumming updates for 3.7+ (especially for Rocky 9 releases).
Begin testing process on WN test node and aim to push to farm in timely manner
Continue investigations to readV methods that will enable the XCache to be removed (and therefore allow writable WNs)
Decisions
Type /decision to record the decisions you make in this meeting:
35d858c8-573c-4f99-b3dc-165f8bcda905dfe490fb-c12f-4580-b44a-d04d5d6f86e2DECIDEDGateway configuration following Option 1 is preferred: “Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered)”86357e02-26a4-4539-9d7c-75fbdca5f408DECIDED467a4023-51bf-41cd-8ca5-4234238c9223DECIDED
Gateway configuration following Option 1 is preferred: “Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered)”