Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

Apologies:

James Walder

\uD83E\uDD45 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

\uD83D\uDDE3 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

Item

Presenter

Notes

XRootD Releases

5.6.2-2 is out
- Plan for deployment

  • Is passing on pre-prod

  • upgrade of CMake version, exhibiting ‘stack smashing’ ?? (or different compile versions)

  • Features for XrdCeph that need to be included ?

Aim for prod testing next week. (aim for 1 week of testing, then deploy if ok).

Sam notes:
if you're building xrootd 5.6.2 from source, the tagged 5.6.2 in the git repo does not have the bugfix for authdb parsing, as I just discovered to my cost (that's 2 commits later...)

Lancs works (off the shelf 562-2)

Checksums fixes

Prefetch studies and WN changes

Alex

(temporarily rolled back, with the ongoing work in batch farm WNs)

  • the LD_Preload for lockless reads were removed (compiled against ‘older’ version of ceph, and superseded by the recent readV work.

  • WN containers need updated Ceph rpms (for 14.2.22)

  • Alexander Rogovskiy to present ‘final’ status of testing in next meeting on the prefetch work

  • James Walder to add work items to the queue in jira,

  • failures on latest prefetch (timeout) might be due to ceph version

Deletion studies through RDR

Ian

CMSD rollout

XRD-41 - Getting issue details... STATUS
New diagram required ?

svc01,02,17,18 stay as internal WN gateways for now.
The other svc hosts 03,05, 11,13-16 to added to CMSD production cluster.

svc19 (designated for Alice gateway)

Gateways on new network plan

ipv6 sorted, firewall rules change in progress

LHCONE issue sorted

fermilab can’t be reached trough v6 but tracepath gets to lhcopn cern router

To consider the TPC instance port

Gateways: observations

WN gateways showed a spike in memory, the 2 gateways with swap enable filled in a few 100GB in swap, the other 2 crashed at the poller

CMSD outstanding items

Icinga / nagios callout tests changes. - live and available

  • ping test for the floating ips and getaway hosts needs some more refinement

Improved load balancing / server failover triggering -

better 'rolling server restart script'

Documentation; setup / configuration / operations / troubleshooting / testing

Review of Sandbox and deployment to prod:
- Awaiting time from Tom for load balancer test

Sandbox has been reviewed awaiting Thomas Byrne for final confirmation

Tokens testing

To Liaise with the TTT Taskforce (aka. Matt Doidge )

no update

AAA Gateways

Sandbox ready for review:

http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3
Needs a bigger discussion regarding Tokens, and deployment in production hosts.

SKA Gateway box

/wiki/spaces/UK/pages/215941180

now working using ska pool on ceph dev

Initial Iperf3 tests: (see table and plots below).

  • Actions

    • Ensure Xrootd01 is tuned correct, according to the Nvidia / mellanox instructions

    • Repeat the iperf tests

  • Xrootd tests against:

    • dev-echo

    • cephfs (Deneb dev)

    • cephfs (openstack; permissions/routing issues)?

    • local disk / mem

  • Frontend routing is also being worked on

containerised gateways (kubernetes cluster)

identified an issue on workernode gateways where ceph nautilus 14.2.15 libraries were loaded (from a previous libradosstriper lockless read implementation) overriding the container installed ceph version

working on ingress setup, had a 'cannot allocate port' error on setting up service (port forwarding), google suggests issue with cluster, will try rebuilding from scratch to see if fixes the issue

on GGUS:

Site reports

Lancaster - moved to 5.6.2-2, all ok

Glasgow - gateways setting up, ceph disk node using lot of swap (one osd using large virtual memory ) 562-2 testing TBD later version of nautilus are more aggressive in cache, recommendation is turning swap off

✅ Action items

⤴ Decisions

  • No labels