Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Item

Presenter

Notes

5.4.3 releases in Centos 7

  • Problems observed: (pgRead / pgWrites)

Thomas, Jyothish (STFC,RAL,SC)

5.5.0-rc1 testing on dev VM, appears to be working ok.
and feedback to xrootd devs ?

Sam also testing building 5.5.0; finds python2 first?
- some updates to cmake required to make it find the relevant python3
- Sam will create an issue once changes finalised …

discovered /wiki/spaces/CD/pages/30998699 while setting up unit tests:
errors in xrdceph related to the cluster object do not cause a clean restart, locking the service into an invisible crash - status still reports OK but all requests sent fail and cause the socket to go into close_wait

Centos 8: outstanding items?

Workernodes with EL8 being prepared for deployment.
Some items like cephsum to be placed into EL8 rpms.

space info reporting:

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-21

functionality implemented in xrdceph, current usage info coming from ceph,
2 additional configurable parameters, ceph.quotapath and ceph.poolnames
quota info coming from local json file read from quotapath (with similar format as s3.echo.stfc.ac.uk/srr/storagesummary.json).
Default set to /etc/xrootd/storagesummary.json.
pool names defined as comma seperated string with trailing comma, read from ceph.poolnames.
Tested as working in dev setup, PR & RPM generated - plan to roll this out as v5.3.8 of xrootdceph

  • Perhaps less relevant for Glasgow.

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-26

5.5.0-rc1 is still not compliant with the WLCG testbed:
https://github.com/xrootd/xrootd/issues/1752

https://github.com/xrootd/xrootd/pull/1753/commits/0e5ef3b7bbf39c5fa4cb594617e2830c0ee72f07

Deletions update

Ian looking at 1-8 GiB deletions against gwX.
Single file writes and deletes

No deletion timeouts observed.
Previous tests (with more parallel deletions) do show longer deletion times

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-27

James

Occasional failed deletes for Atlas due to 0-byte files with partial striper metadata.
(Likely triggered by recent consistency check).

Future of Deletions?

Options:

  • Horizontal scaling, i.e. spread deletes across all available gateways

  • Add (when ready) CMSD to ensure load balancing (at what extra cost),

  • Offload to external tooling to manage the deletes asynchronously

    • Ideas below

    • (JW: currently ‘playing' with test implementation).

View file
nameDraftDeletionSpec.pdf

What are we going to do with Vector Reads now?

All

Removal of locking did not appear to help significantly.

Rob C’s script is currently the only metric for LHCb to compare / verify against.

Options:

  • Write a proper striper-vector-read method

  • Implement one of Andreas’ options

  • Fall back on buffering / range coalesce in the short term, until correct fix available.

    • Collect data on what really is needed

...

This is xroot 5.4.3 on rocky 8.6

✅ Action items

  •  Reminder from Sam to update GGUS tickets
  •  James to review space reporting PR

...