2023-11-30 Meeting Notes
Date
Nov 30, 2023
Participants
@Thomas, Jyothish (STFC,RAL,SC)
@Thomas Byrne
@Alexander Rogovskiy
@James Walder
@Katy Ellis
@Thomas Birkett
Lancs: @Matt Doidge, Gerard, Steven
Glasgow: Sam
Apologies:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Near and mid-term planning |
| Time to build up the aims for the next 3, 6, 12 months
|
|
bugfix for calculating striper objects in direct reads |
|
|
|
Gateway: observations and changes | @Thomas, Jyothish (STFC,RAL,SC) | Change the CMSD configuration to increase the frequency of load reporting / calculation
|
|
Improving the load balancing | @Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC) | Config changes (on all gw servers and managers): Decrease ping and usage reporting intervals: cms.ping 10 log 1 usage 2 Also fix the xrdload script to report the 5min load-avg cf. (15 min)
|
|
Shoveller: Moving from testing and dev. to production and operational support | @Katy Ellis | VM (may exist) for the Collector. requires a (monitor) config update on the XRootD servers to monitor (to point at the Collector). Also could be used by RALPP ? @Katy Ellis To confirm that AAA with shoveller can also send monitoring to Tom’s ftstream monitoring… Documentation: |
|
XrootD gateway architecture review (What should the XrootD access to Echo look like in a year’s time) |
| https://stfc.atlassian.net/wiki/spaces/GRIDPP/pages/255262851 Ideas on xrootd batch farm architecture Current State Key questions: What to aim for:
Containerizing everything (shared containers across all hardware) is the preferred desired end state. some system resource overhead should be reserved to keep the gateways running smoothly
WN gateways:
|
|
XRootD Release and deployment schedule |
| 5.6.3-1 is out
|
|
Checksums fixes |
| On hold again, pending load balancer work; i.e. if we can improve the load balancing, do we improve the latency associated to checksums? |
|
Prefetch studies and WN changes | @Alexander Rogovskiy | New tests with |
|
Deletion studies through RDR | @Ian Johnson | Requirements from ATLAS VO (Alessandra): 11266 deletions/h of 3GiB files. We are mean seeing deletion times for 3GiB files of 0.5 - 4 seconds, however there are some large outliers. (Taken from 10 deletions of 3GiB batches, 500 files in each batch). Current deletion times to delete a batch of 500 3GiB files average around 30s (with some large outliers, however). Extrapolating from the average would suggest a bulk deletion rate of 60,000 files per hour is achievable within RAL, using the CERN deletion timing program. It would be instructive to find test whether the deletion mechanism that ATLAS will use during DC24 (FTS?) is able to achieve acceptable deletion rates. An example of the variation in range of deletion times (plots from 07:30 this morning and 11:40):
|
|
Tokens testing | @Thomas, Jyothish (STFC,RAL,SC) |
|
|
SKA Gateway box | @James Walder | https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180 Deneb-dev routing still needed (on the Switch / router side). Some tests with Ceph-dev and changing of the rados-striper Difference between upload and download may be due to uploads from local disk, downloads to /dev/null. (to repeat with tmpfs). |
|
WN Xcache issue |
| futex lock hard locking xcache proxy on WNs (possibly occurrence of Deadlock in XCache's XrdCl instance · Issue #1979 · xrootd/xrootd ) |
|
on GGUS:
Site reports
Lancaster - Generally plagued by xrootd being unreliable under stress, throwing more gateways at it.
Glasgow -
Action items