Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Recent sandbox’s for review / deployments:

Config changes (on all gw servers and managers):

Decrease ping and usage reporting intervals: cms.ping 10 log 1 usage 2
Increasing space placement calculation rates: cms.space recalc 30 min 1g

Also fix the xrdload script to report the 5min load-avg cf. (15 min)

Item

Presenter

Notes

Near and mid-term planning

Time to build up the aims for the next 3, 6, 12 months

  • Discussion on XRootD re-architecture and development of plans

  • XRootD 6.X (and building for XrdCeph)

  • XrdCeph developments ?

  • Development tasks (question):

  • Per-file transfer speed

  • Deletions timings

  • Checksum speed

  • load balancing and gateway failover-ing

  • Containerisation and orchestration

    Deployment plan and changes anticipated before Christmas

    Thomas, Jyothish (STFC,RAL,SC)

    bugfix for calculating striper objects in direct reads

    https://github.com/stfc/xrootd-ceph/pull/50

    Any further comments on the PR; or ready to merge ?

    Gateway: observations and changes

    Thomas, Jyothish (STFC,RAL,SC)

    Change the CMSD configuration to increase the frequency of load reporting / calculation

    Improving the load balancing

    Thomas Byrne Thomas, Jyothish (STFC,RAL,SC)

    View file
    nameCMSD_loadbalancing.pptx

    Shoveller: Moving from testing and dev. to production and operational support

    Katy Ellis

    VM (may exist) for the Collector.

    requires a (monitor) config update on the XRootD servers to monitor (to point at the Collector).

    Also could be used by RALPP ?

    Katy Ellis To confirm that AAA with shoveller can also send monitoring to Tom’s ftstream monitoring…

    Documentation:
    - https://twiki.cern.ch/twiki/bin/view/LCG/MonitoringTaskForce#Shoveler
    - Katy Ellis to capture the documentation and configuration, to work with Thomas, Jyothish (STFC,RAL,SC)

    XrootD gateway architecture review (What should the XrootD access to Echo look like in a year’s time)

    /wiki/spaces/GRIDPP/pages/255262851

    Ideas on xrootd batch farm architecture

    Current State
    ECHO aliases

    Key questions:
    Should we segregate s3 and WN traffic from FTS?
    having each service have its own redirector endpoint is good to maintain high availability and redundancy. additional hardware capacity can then be more easily added to each service if needed.
    Shared servers: cmsd allows multiple clusters (redirector managers) to include the same server (gw in our case)
    Multiple clusters? with or w/o shared servers? (no shared servers if possible)
    Having multiple clusters is good (each service should have its own redirector managers for HA), shared servers will not be needed on a containerised setup, ease of management is preferred over slight resource optimization (with shared servers, underutilised servers could get traffic from other services which are under higher load to make more use of existing capacity)

    What to aim for:

    • every service instance needs to be as resilient as possible.
      each DNS endpoint should have keepalived for redundancy and redirectors for high availability

    • manageability
      adding/removing gateways from a service should be simple and the overall setup should not be too complex to understand or manage

    • aim for simpler config
      keep the config understandable

    • flexibility on gw/use case to meet burst demand
      we should be able to add gws quickly to service endpoints as smoothly as possible to facilitate burst demands (e.g. alice) and to deploy additional capacity quickly

    Containerizing everything (shared containers across all hardware) is the preferred desired end state.
    This has the prerequisite of every service being behind an expandable high availability setup (xrootd cmsd managers)
    [and an orchestrated setup to spin up more gws for load increase]

    some system resource overhead should be reserved to keep the gateways running smoothly

    WN gateways:
    this should be kept going forwards as they mean we have an additional gateway’s worth of capacity for every workernode.
    They currently only redirect traffic for reads over root (job operations using the xrootd.echo endpoint).
    This is because of Xcache, which is read only.
    Xcache is good at what it does and reduces the number of iops hitting ceph from reads. During the vector read deployment they were removed and resulted in enough IOps to slow down the echo storage cluster enough to fail

    • xcache can be removed if xrdceph buffers provide similar functionality (allows R/W over local gw)
      xrdeph buffers do not work on out of order reads or separate read requests (like the case with alice gateways)

    • some sort of xrootd manager tree setup might work for WN gw containers
      this could be similar to CMS AAA, with a hierarchy for access, but the first point of contact should be highly available

    • a single gw failing on a workernode should not cause all its jobs to fail. currently there is no failover built in for WN reads, so if the gateway is down all jobs on that WN will fail

    • a functional test equivalent healthcheck for WN gw will ensure the gateway is killed and restarted, and makes condor know if the gateway is still down. This would stop new jobs being sent to a WN with a broken gw but the jobs currently on it will still run.

    • The solution should strongly prefer a WN’s own gw. Ideally there should be some fallback mechanism where the transfer attempts to use its own gateway first and fails over to its neighbour WNs' gateway if unavailable.

    • cmsd is not smart enough to deal with r only and r/w servers as part of its cluster (this was attempted by Sam at Glasgow during early 5.x)

    • strong preference for having the same endpoint for reads and writes (removing xcache). This makes the configuration simpler and allows it to be managed by a cmsd redirector without issues.

    • A: evaluate whether xcache can be removed with xrdceph buffers enabled (measure IOps on single WN)

    • A: design a better solution for the gws on the WNs

    • A: create redirector managers for alice and s3

    • A: develop cmsd redirector capability to redirect onto own gateway preferably and have xcaches be included in the redirector in a mixed gw setup

    XRootD Release and deployment schedule

    5.6.3-1 is out

    Checksums fixes

    On hold again, pending load balancer work; i.e. if we can improve the load balancing, do we improve the latency associated to checksums?

    Image Added

    Checksums fixes

    Alexander Rogovskiy

    Deployment appears to improve that metadata retrieval time at the checksum app layer, but not from the user client layer …

    Image Added


    Prefetch studies and WN changes

    Alexander Rogovskiy

    New tests with xrootd.async segsize 8m timeout 300 and pfc.prefetch 0. With timeout increase “prefetch off” configuration looks better than the current one.

    Deletion studies through RDR

    Ian Johnson

    Requirements from ATLAS VO (Alessandra): 11266 deletions/h of 3GiB files. We are mean seeing deletion times for 3GiB files of 0.5 - 4 seconds, however there are some large outliers. (Taken from 10 deletions of 3GiB batches, 500 files in each batch).

    Current deletion times to delete a batch of 500 3GiB files average around 30s (with some large outliers, however). Extrapolating from the average would suggest a bulk deletion rate of 60,000 files per hour is achievable within RAL, using the CERN deletion timing program. It would be instructive to find test whether the deletion mechanism that ATLAS will use during DC24 (FTS?) is able to achieve acceptable deletion rates.

    An example of the variation in range of deletion times (plots from 07:30 this morning and 11:40):

    Tokens testing

    Thomas, Jyothish (STFC,RAL,SC) Katy Ellis

    Jira Legacy
    serverSystem JIRA
    serverId929eceee-34b0-3928-beeb-a1a37de31a8b
    keyXRD-63

    SKA Gateway box

    James Walder

    /wiki/spaces/UK/pages/215941180

    Deneb-dev routing still needed (on the Switch / router side).

    Some tests with Ceph-dev and changing of the rados-striper

    Difference between upload and download may be due to uploads from local disk, downloads to /dev/null. (to repeat with tmpfs).

    WN Xcache issue

    futex lock hard locking xcache proxy on WNs (possibly occurrence of https://github.com/xrootd/xrootd/issues/1979 )

    ...

    on GGUS:

    Site reports

    Lancaster - Generally plagued by xrootd being unreliable under stress, throwing more gateways at itPlanning purchase of more gateways - trying to decide between single or dual CPU and would appreciate views on that.

    Glasgow -

    ✅ Action items

    ⤴ Decisions

    ...