Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

Apologies:

\uD83E\uDD45 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

\uD83D\uDDE3 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

Item

Presenter

Notes

XRootD Releases

5.6.2-2 is out
Under test

Prefetch studies

Alex

(temporarily to be rolled back, with the ongoing work in batch farm WNs)

Deletion studies through RDR

Ian

ATLAS concern over deletion rate for DC24

JW

DC24 ATLAS expected _average_
deletion rate from RAL storage will be ~ 40-60k files per hour.
Considering history of Echo deletions performance issues [1] could you
please make sure that everything works fine up to these rates.

Can we cope with this rate (assuming additional gateways) without fundamental changes?
Is a re-architecture of how deletions are performed needed (either for DC, or towards HL-LHC).
Total throughput and per-file deletion times to be considered.

Rate (nominal) for atlas assumes therefore ~ 20Hz

Production deletion times (recent logs); only including the time within ceph, and not the xrootd and client RTT:
[in seconds]

count 167980.000000 mean 2.951339 std 5.467953 min 0.015000 25% 0.282000 50% 0.570000 75% 3.486000 max 271.880000

CMSD rollout

XRD-41 - Getting issue details... STATUS

Future Architecture of Data access on WNs

VOs asked to provide input on their requirements / use cases

Gateways: observations

workernode write traffic temporarily redirected to gateways on the new network. Results look promising, initial testing of 3 generations to one gateway resulted in 40k uploads over 1.5 days with only 1 failure due to an expired certificate proxy. This change will be reverted once external ipv6 is available on the new network, but future separation of job and fts traffic seems sensible

CMSD outstanding items

Icinga / nags callout tests changes. - live and available

Improved load balancing / server failover triggering -

better 'rolling server restart script'

Documentation; setup / configuration / operations / troubleshooting / testing

Review of Sandbox and deployment to prod:
- Initial review spotted requirement to split the feature to have a non-CMSD version.

  • New feature (copy of existing prod version) made, but needs testing after adding in ‘named variable substitution’ into the xrootd config script

  • cms feature: Add in the ‘named variable substitution’ and finalise the review.

Tokens testing

NTR

AAA Gateways

Sandbox ready for review:

http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3

SKA Gateway box

/wiki/spaces/UK/pages/215941180

now working using ska pool on ceph dev

Initial Iperf3 tests: (see table and plots below).

  • Actions

    • Ensure Xrootd01 is tuned correct, according to the Nvidia / mellanox instructions

    • Repeat the iperf tests

  • Xrootd tests against:

    • dev-echo

    • cephfs (Deneb dev)

    • cephfs (openstack; permissions/routing issues)?

    • local disk / mem

  • Frontend routing is also being worked on

extra gateways deployment

… awaiting networking updates; 4 being repurposed for internal (mostly) writes …

correlation between 'spikes' on new internal gateways to additional jobs running by particular VOs.

ALICE WN gateways

(Birmingham using eos, Oxford no storage)

Relationship to OSD issues ?

Best practice document for Ceph configuration?

e.g. autoscaling features ?

on GGUS:

Site reports

Lancaster - .

Glasgow

✅ Action items

⤴ Decisions

  • No labels