Bring online tranche at a time confirming with data team xrootd is working as expected.
10-11th May: Let the updated workers run for a few days.
12th May: `wn-2017-dell (all 2017’s) - wn-2019-dell` will be set to drain.
15th May: Repeat above process for second half of workers.
16th May: Merge all required sandboxes into prod and manage farm back into `prod_batch` in AQ
Next steps for WN deployment
Possible options for short term WN status:
Currently configuration: 5.3.3-2(core) + 5.3.4-1 (xroot-ceph) for proxy and gateway
Move to 5.5.4-2 (core) + 5.5.4-3 (xroot-ceph-buffered):
Fixes the Xcache “Filename too long issue” (to be confirmed)!
Provided buffering on ‘gateway’ for passed-through reads
allows non-striper reads and readV requests (i.e. Alex updates) (also for passed through read(v)
(b) and (c) are all configurable within the xrootd-xxx.cfg configuration files
paged reads / (writes) would be enabled; probably only between Xcache and gateway (TBC)
General fixes from 5.5.X series
5.5.4 currently being tested on lcg2268 (2017 dell, ml) (not exactly in this configuration however).
5.3.3-x (core) + 5.3.3-6 (xroot-ceph-buffered).
Needs additional patch for “filename too long issue”;
resulting in different (core) xrootd rpms for proxy and ceph (or a more detailed patch).
We ‘understand’ 5.3.3 as a working and stable release
Most testing on WNs done under this configuration
(not for initial consideration) the proxy can be configured as disk-caching proxy (XCache) or to ‘forward / passthrough’ the requests to the gateway, without the need for draining the farm.
EBUSY in readV requests
Observation during Echo problem period -EBUSY requests from ceph, which are caught int the BufferedIO Read calls (5 attempts, then returns an -EIO error). We should ensure that readV requests also catch -EBUSY errors correctly, and not pass them back to core xrootd. James Walder to create jira.
Discussion on merging bufferedIO into master. Also to discuss pushing changes to “upstream” (xrootd/xroot-ceph”