Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Current »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

\uD83E\uDD45 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

\uD83D\uDDE3 Discussion topics

https://stfc.atlassian.net/jira/software/c/projects/XRD/boards/26/roadmap

Item

Presenter

Notes

5.4.3 releases in Centos 7

  • Problems observed: (pgRead / pgWrites)

Thomas, Jyothish (STFC,RAL,SC)

5.5.0-rc1 testing on dev VM, appears to be working ok.
and feedback to xrootd devs ?

Sam also testing building 5.5.0; finds python2 first?
- some updates to cmake required to make it find the relevant python3
- Sam will create an issue once changes finalised …

discovered /wiki/spaces/CD/pages/30998699 while setting up unit tests:
errors in xrdceph related to the cluster object do not cause a clean restart, locking the service into an invisible crash - status still reports OK but all requests sent fail and cause the socket to go into close_wait

Centos 8: outstanding items?

Workernodes with EL8 being prepared for deployment.
Some items like cephsum to be placed into EL8 rpms.

space info reporting:

XRD-21 - Getting issue details... STATUS

functionality implemented in xrdceph, current usage info coming from ceph,
2 additional configurable parameters, ceph.quotapath and ceph.poolnames
quota info coming from local json file read from quotapath (with similar format as s3.echo.stfc.ac.uk/srr/storagesummary.json).
Default set to /etc/xrootd/storagesummary.json.
pool names defined as comma seperated string with trailing comma, read from ceph.poolnames.
Tested as working in dev setup, PR & RPM generated - plan to roll this out as v5.3.8 of xrootdceph

  • Perhaps less relevant for Glasgow.

XRD-26 - Getting issue details... STATUS

5.5.0-rc1 is still not compliant with the WLCG testbed:
https://github.com/xrootd/xrootd/issues/1752

https://github.com/xrootd/xrootd/pull/1753/commits/0e5ef3b7bbf39c5fa4cb594617e2830c0ee72f07

Deletions update

Ian looking at 1-8 GiB deletions against gwX.
Single file writes and deletes

No deletion timeouts observed.
Previous tests (with more parallel deletions) do show longer deletion times

XRD-27 - Getting issue details... STATUS

James

Occasional failed deletes for Atlas due to 0-byte files with partial striper metadata.
(Likely triggered by recent consistency check).

Future of Deletions?

Options:

  • Horizontal scaling, i.e. spread deletes across all available gateways

  • Add (when ready) CMSD to ensure load balancing (at what extra cost),

  • Offload to external tooling to manage the deletes asynchronously

    • Ideas below

    • (JW: currently ‘playing' with test implementation).

What are we going to do with Vector Reads now?

All

Removal of locking did not appear to help significantly.

Rob C’s script is currently the only metric for LHCb to compare / verify against.

Options:

  • Write a proper striper-vector-read method

  • Implement one of Andreas’ options

  • Fall back on buffering / range coalesce in the short term, until correct fix available.

    • Collect data on what really is needed

Site reports

Glasgow:
With some testing, appears that the namelib must be present on the proxy (via pss.namelib) to enable to smoke-tests to work.

Lancaster

Redirector adventures are going quite well - except when I tried to put in our new rocky 8 server into our test xroot cluster. It crashes on the http smoke test…

It is actually a little different then the old libmacaroon problems - the crash occurs  when it gets to the second batch of tests (the TPC tests). The stack trace in the syslog also doesn’t mention libmacaroons at all.

This is xroot 5.4.3 on rocky 8.6

✅ Action items

  • Reminder from Sam to update GGUS tickets
  • James to review space reporting PR

⤴ Decisions

  • No labels