Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Item

Presenter

Notes

XRootD Releases

5.6.2-2 is out
- Plan for deployment

  • Is passing on pre-prod

  • upgrade of CMake version, exhibiting ‘stack smashing’ ?? (or different compile versions)

  • Features for XrdCeph that need to be included ?

Aim for prod testing next week. (aim for 1 week of testing, then deploy if ok).

Sam notes:
if you're building xrootd 5.6.2 from source, the tagged 5.6.2 in the git repo does not have the bugfix for authdb parsing, as I just discovered to my cost (that's 2 commits later...)

Lancs works (off the shelf 562-2)

3 is out

Checksums fixes

Prefetch studies and WN changes

Alex

(temporarily rolled back, with the ongoing work in batch farm WNs)

  • the LD_Preload for lockless reads were removed (compiled against ‘older’ version of ceph, and superseded by the recent readV work.

  • WN containers need updated Ceph rpms (for 14.2.22)

  • Alexander Rogovskiy to present ‘final’ status of testing in next meeting on the prefetch work

View file
namepres_xrootd.pdf

  • James Walder to add work items to the queue in jira,

  • failures on latest prefetch (timeout) might be due to ceph version

Deletion studies through RDR

Ian

CMSD rollout

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-41

New diagram required ?

svc01,02,17,18 stay as internal WN gateways for now.
The other svc hosts 03,05, 11,13-16 to added to CMSD production cluster.

svc19 (designated for Alice gateway)

Gateways on new network plan

ipv6 sorted, firewall rules change in progress

LHCONE issue sorted

fermilab can’t be reached trough v6 but tracepath gets to lhcopn cern router

To consider the TPC instance port

Gateways: observations

WN gateways showed a spike in memory, the 2 gateways with swap enable filled in a few 100GB in swap, the other 2 crashed at the poller

Image Removed

CMSD outstanding items

Icinga / nagios callout tests changes. - live and available

  • ping test for the floating ips and getaway hosts needs some more refinement

Improved load balancing / server failover triggering -

better 'rolling server restart script'

Documentation; setup / configuration / operations / troubleshooting / testing

Review of Sandbox and deployment to prod:
- Awaiting time from Tom for load balancer test

Sandbox has been reviewed awaiting Thomas Byrne for final confirmation

CMSD outstanding items

sandbox deployed

Tokens testing

To Liaise with the TTT Taskforce (aka. Matt Doidge )

no update

AAA Gateways

Sandbox ready for review:

http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3
Needs a bigger discussion regarding Tokens, and deployment in production hosts.

SKA Gateway box

/wiki/spaces/UK/pages/215941180

now working using ska pool on ceph dev

Initial Iperf3 tests: (see table and plots below).

  • Actions

    • Ensure Xrootd01 is tuned correct, according to the Nvidia / mellanox instructions

    • Repeat the iperf tests

  • Xrootd tests against:

    • dev-echo

    • cephfs (Deneb dev)

    • cephfs (openstack; permissions/routing issues)?

    • local disk / mem

  • Frontend routing is also being worked on

  • Image RemovedImage Removed

    containerised gateways (kubernetes cluster)identified an issue on workernode gateways where ceph nautilus 14.2.15 libraries were loaded (from a previous libradosstriper lockless read implementation) overriding the container installed ceph version working on ingress setup, had a 'cannot allocate port' error on setting up service (port forwarding), google suggests issue with cluster, will try rebuilding from scratch to see if fixes the issue

    on GGUS:

    Site reports

    Lancaster - moved to 5.6.2-2, all ok

    ...