Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Apologies:

cc. Alastair Dewhurst

...

Item

Presenter

Notes

Deployment plan and changes anticipated before Christmas

Thomas, Jyothish (STFC,RAL,SC)

Change freeze ~ today.

new gateways awaiting updated AAAA

the bug fix (below)
manager ping

bugfix for calculating striper objects in direct reads

https://github.com/stfc/xrootd-ceph/pull/50

Any further comments on the PR; or ready to merge ?

Katy Ellis to confirm a successful file dump read.

File dump read failed for ‘unrelated’ reasons (auth failure)

Gateway: observations and changes

Thomas, Jyothish (STFC,RAL,SC)

Change the CMSD configuration to increase the frequency of load reporting / calculation

Image RemovedImage RemovedImage RemovedImage Added

Tom to add slides

Checksums fixes

Alexander Rogovskiy

Deployment appears to improve that metadata retrieval time at the checksum app layer, but not from the user client layer …

View file
nameslides.pdf

Image Removed


It is indeed possible to remove stale checksum check in a dedicated checksum library. This is now being tested on ceph-gw8. Very preliminary results look good.

Image Added

Prefetch studies and WN changes

Alexander Rogovskiy

New tests with xrootd.async segsize 8m timeout 300 and pfc.prefetch 0. With timeout increase “prefetch off” configuration looks better than the current one.

Deletion studies through RDR

Ian Johnson

Requirements from ATLAS VO (Alessandra): 11266 deletions/h of 3GiB files. We are mean seeing deletion times for 3GiB files of 0.5 - 4 seconds, however there are some large outliers. (Taken from 10 deletions of 3GiB batches, 500 files in each batch).

Current deletion times to delete a batch of 500 3GiB files average around 30s (with some large outliers, however). Extrapolating from the average would suggest a bulk deletion rate of 60,000 files per hour is achievable within RAL, using the CERN deletion timing program. It would be instructive to find test whether the deletion mechanism that ATLAS will use during DC24 (FTS?) is able to achieve acceptable deletion rates.

An example of the variation in range of deletion times (plots from 07:30 this morning and 11:40):

Now testing deletion times with different concurrency levels (20 and 40). First results are that using 40 results in a lower overall deletion time, hence a higher deletion rate. To delete 500, 3 GiB files takes 14s with 40 threads, against 18s with 20 threads. However, mean and median times are slightly larger with 40 threads (1.1s for both median and mean, against 0.6 and 0.7 for 20 threads). The measurements with 20 and 40 threads were taken one after the other (within five minutes), so I don’t think that ECHO would have been performing much differently within that time range. Will run multiple tests at varying levels of concurrency to get more reliable stats, depending on what I find from the following:

As the DC24 deletions involve Rucio invoking FTS, I’m started running bulk deletions with RAL FTS to observe how many deletion jobs run at any time. I’m discussing with Alessandra and Mihai what level of concurrency we might expect from CERN FTS during DC24.

Tokens testing

Thomas, Jyothish (STFC,RAL,SC) Katy Ellis

Jira Legacy
serverSystem JIRA
serverId929eceee-34b0-3928-beeb-a1a37de31a8b
keyXRD-63

can either have SAM tests passing OR the correct set of permissions.

scitokens.trace probably doesn’t alter the results now.
https://github.com/xrootd/xrootd/pull/2151/files

SKA Gateway box

James Walder

/wiki/spaces/UK/pages/215941180

Deneb-dev now connected to the xrootd01 box.
A few ad-hoc tests tried. Will be starting to run for systematic tests.

WN Xcache issue

futex lock hard locking xcache proxy on WNs (possibly occurrence of https://github.com/xrootd/xrootd/issues/1979 )

Fixed!

on GGUS:

Site reports

Lancaster - Planning purchase of more gateways - trying to decide between single or dual CPU and would appreciate views on that.- Leaned towards dual-CPU hosts for our new gateways. As discussed potential move towards “production” users doing r/w over CEPHFS.

Recently changed our CMSD settings:

Unchanged settings:

cms.sched maxload 30 io 30 mem 20 cpu 20 pag 0 runq 0 space 0 fuzz 5

Old settings:

cms.ping 60 log 10 usage 10

cms.perf int 60 pgm /usr/share/xrootd/utils/cms_monPerf 60

New settings (2023-12-13T13:30):

cms.ping 15 log 20 usage 4

cms.perf int 15 pgm /usr/share/xrootd/utils/cms_monPerf 15

Image Added

Glasgow -

✅ Action items

...