Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

\uD83D\uDDD3 Date

05 Oct

\uD83D\uDC65 Participants

...

Apologies:

James Walder

\uD83E\uDD45 Goals

...

Recent sandbox’s for review / deployments:

5.6.2-2 is out
- Plan for deployment

  • Is passing on pre-prod

  • upgrade of CMake version, exhibiting ‘stack smashing’ ?? (or different compile versions)

  • Features for XrdCeph that need to be included ?

Aim for prod testing next week. (aim for 1 week of testing, then deploy if ok).

Sam notes:
if you're building xrootd 5.6.2 from source, the tagged 5.6.2 in the git repo does not have the bugfix for authdb parsing, as I just discovered to my cost (that's 2 commits later...)

Lancs works (off the shelf 562-2)

(temporarily rolled back, with the ongoing work in batch farm WNs)

  • the LD_Preload for lockless reads were removed (compiled against ‘older’ version of ceph, and superseded by the recent readV work.

  • WN containers need updated Ceph rpms (for 14.2.22)

  • Alexander Rogovskiy to present ‘final’ status of testing in next meeting on the prefetch work

Item

Presenter

Notes

XRootD Releases

Checksums fixes

Prefetch studies and WN changes

Alex

View file
namepres_xrootd.pdf

  • James Walder to add work items to the queue in jira,

  • failures on latest prefetch (timeout) might be due to ceph version

    XrootD gateway architecture review (What should the XrootD access to Echo look like in a year’s time)

    /wiki/spaces/GRIDPP/pages/255262851

    Ideas on xrootd batch farm architecture

    Current State
    ECHO aliases

    Image Added

    Key questions:
    Should we segregate s3 and WN traffic from FTS?
    having each service have its own redirector endpoint is good to maintain high availability and redundancy. additional hardware capacity can then be more easily added to each service if needed.
    Shared servers: cmsd allows multiple clusters (redirector managers) to include the same server (gw in our case)
    Multiple clusters? with or w/o shared servers? (no shared servers if possible)
    Having multiple clusters is good (each service should have its own redirector managers for HA), shared servers will not be needed on a containerised setup, ease of management is preferred over slight resource optimization (with shared servers, underutilised servers could get traffic from other services which are under higher load to make more use of existing capacity)

    What to aim for:

    • every service instance needs to be as resilient as possible.
      each DNS endpoint should have keepalived for redundancy and redirectors for high availability

    • manageability
      adding/removing gateways from a service should be simple and the overall setup should not be too complex to understand or manage

    • aim for simpler config
      keep the config understandable

    • flexibility on gw/use case to meet burst demand
      we should be able to add gws quickly to service endpoints as smoothly as possible to facilitate burst demands (e.g. alice) and to deploy additional capacity quickly

    Containerizing everything (shared containers across all hardware) is the preferred desired end state.
    This has the prerequisite of every service being behind an expandable high availability setup (xrootd cmsd managers)
    [and an orchestrated setup to spin up more gws for load increase]

    some system resource overhead should be reserved to keep the gateways running smoothly

    WN gateways:
    this should be kept going forwards as they mean we have an additional gateway’s worth of capacity for every workernode.
    They currently only redirect traffic for reads over root (job operations using the xrootd.echo endpoint).
    This is because of Xcache, which is read only.
    Xcache is good at what it does and reduces the number of iops hitting ceph from reads. During the vector read deployment they were removed and resulted in enough IOps to slow down the echo storage cluster enough to fail

    • xcache can be removed if xrdceph buffers provide similar functionality (allows R/W over local gw)
      xrdeph buffers do not work on out of order reads or separate read requests (like the case with alice gateways)

    • some sort of xrootd manager tree setup might work for WN gw containers
      this could be similar to CMS AAA, with a hierarchy for access, but the first point of contact should be highly available

    • a single gw failing on a workernode should not cause all its jobs to fail. currently there is no failover built in for WN reads, so if the gateway is down all jobs on that WN will fail

    • a functional test equivalent healthcheck for WN gw will ensure the gateway is killed and restarted, and makes condor know if the gateway is still down. This would stop new jobs being sent to a WN with a broken gw but the jobs currently on it will still run.

    • The solution should strongly prefer a WN’s own gw. Ideally there should be some fallback mechanism where the transfer attempts to use its own gateway first and fails over to its neighbour WNs' gateway if unavailable.

    • cmsd is not smart enough to deal with r only and r/w servers as part of its cluster (this was attempted by Sam at Glasgow during early 5.x)

    • strong preference for having the same endpoint for reads and writes (removing xcache). This makes the configuration simpler and allows it to be managed by a cmsd redirector without issues.

    • A: evaluate whether xcache can be removed with xrdceph buffers enabled (measure IOps on single WN)

    • A: design a better solution for the gws on the WNs

    • A: create redirector managers for alice and s3

    • A: develop cmsd redirector capability to redirect onto own gateway preferably and have xcaches be included in the redirector in a mixed gw setup

    XRootD Releases

    5.6.3-1 is out

    Glasgow Lancs has been using it (el7 and rocky8) (no cmfst post centos7)

    Checksums fixes

    planned for deployments

    checksum server service for external gws

    Prefetch studies and WN changes

    Alex

    planned for week of 20th to resume partial deployment over the farm

    Deletion studies through RDR

    Ian

    CMSD rollout

    Jira Legacy
    serverSystem JIRA
    serverId929eceee-34b0-3928-beeb-a1a37de31a8b
    keyXRD-41
    New diagram required ?

    svc01,02,17,18 stay as internal WN gateways for now.
    The other svc hosts 03,05, 11,13-16 to added to CMSD production cluster.

    svc19 (designated for Alice gateway)

    Gateways on new network plan

    ipv6 sorted, firewall rules change in progress

    LHCONE issue sorted

    fermilab can’t be reached trough v6 but tracepath gets to lhcopn cern router

    To consider the TPC instance port


    Gateways: observations

    WN gateways showed a spike in memory, the 2 gateways with swap enable filled in a few 100GB in swap, the other 2 crashed at the poller

    Image Removed

    CMSD outstanding items

    Icinga / nagios callout tests changes. - live and available

    • ping test for the floating ips and getaway hosts needs some more refinement

    Improved load balancing / server failover triggering -

    better 'rolling server restart script'

    Documentation; setup / configuration / operations / troubleshooting / testing

    Review of Sandbox and deployment to prod:
    - Awaiting time from Tom for load balancer test

    Sandbox has been reviewed awaiting Thomas Byrne for final confirmation

    cmsd sandbox has been deployed

    Gateways: observations

    cluster was up and active, but one sn was very slow in troughput slowing the whole cluster down enough for gateways to fail functional tests

    CMSD outstanding items

    sandbox deployed

    Tokens testing

    To Liaise with the TTT Token Trust Traceability Taskforce (aka. Matt Doidge )no update

    report by end of this month

    CMS GGUS for enabling token auth
    planned deployment on the week of the 20th
    CC for next week

    AAA Gateways

    Sandbox ready for review:

    http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3
    Needs a bigger discussion regarding Tokens, and deployment in production hosts.

    to be reviewed and deployed this week

    SKA Gateway box

    /wiki/spaces/UK/pages/215941180

    now working using ska pool on ceph dev

    Initial Iperf3 tests: (see table and plots below).

  • Actions

    • Ensure Xrootd01 is tuned correct, according to the Nvidia / mellanox instructions

    • Repeat the iperf tests

  • Xrootd tests against:

    • dev-echo

    • cephfs (Deneb dev)

    • cephfs (openstack; permissions/routing issues)?

    • local disk / mem

  • Frontend routing is also being worked on

  • Image RemovedImage Removed

    ongoing network cleanup to access Deneb

    containerised gateways (kubernetes cluster)

    identified an issue on workernode gateways where ceph nautilus 14.2.15 libraries were loaded (from a previous libradosstriper lockless read implementation) overriding the container installed ceph version

    working on ingress setup, had a 'cannot allocate port' error on setting up service (port forwarding), google suggests issue with cluster, will try rebuilding from scratch to see if fixes the issueworking but still needs ironing a few bugs and scaling up

    on GGUS:

    Site reports

    Lancaster - moved to Not much more, updated to latest xrootd broke scitokens as the scitoken package also needs updating (done manually)

    Glasgow - 5.6.2-2, all okGlasgow - gateways setting up, ceph disk node using lot of swap (one osd using large virtual memory ) 562-2 testing TBD later version of nautilus are more aggressive in cache, recommendation is turning swap off3 on rocky8, needs to do redirector on internal gateways, also updating xrootd and xrdceph version. Internal gateways were using up all memory (64G RAM and Swap), plan to update RAM

    swap 0 atm, need to switch to swap off on reboot. newer versions of ceph are more determined to use resources

    ✅ Action items

    ⤴ Decisions

    ...