2024-07-18 Meeting Notes
Date
Jul 18, 2024
Participants
@James Walder
@Alexander Rogovskiy
@Ian Johnson
@Mariam Demir
@Alastair Dewhurst
Lancs: @Matt Doidge Steven
Glasgow:
Apologies:
Gerard
@Thomas, Jyothish (STFC,RAL,SC)
CC:
Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandbox’s for review / deployments:
Item | Presenter | Notes |
|
---|---|---|---|
Operational Issues | @Thomas, Jyothish (STFC,RAL,SC)
‘Memory allocation errors on WN proxy containers’ | External Gateways bash-5.1$ grep 'root://' errors.csv | grep -o 'GError(.[^"'"'"']*' | sort | uniq -c | sort -g -r
1028 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3008] cannot allocate memory (source)\n
265 GError('Failed to stat file (No such device)
86 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Operation expired: (source)
78 GError('Error on XrdCl::CopyProcess::Run(): [FATAL] Redirect limit has been reached: (source)
30 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3034] aio file read timed out (source)\n
23 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3010] org.dcache.uuid is no longer valid. (source)\n
20 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Local error: no space left on device: (destination) |
|
CMSD Load balancing | @Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC) |
| |
Gateway Auth failures | @Thomas, Jyothish (STFC,RAL,SC) |
|
|
XrootD Workshop plan | @Alastair Dewhurst | @Alastair Dewhurst to set up regular LOC meeting to progress agenda / registrations, etc …
|
|
Shoveler | @Katy Ellis |
Shoveler installation and monitoring Why is this on a cloud VM? |
|
Future developments ideas planning work | @Ian Johnson @Thomas, Jyothish (STFC,RAL,SC) |
| |
Deletion studies through RDR | @Ian Johnson
|
|
|
Deletions | Testing a script to run multiple RADOS operations to time concurrent deletions from multiple gateways. |
| |
Planning for ALICE CMSD redirection | @Thomas, Jyothish (STFC,RAL,SC) |
|
|
Checksums fixes | @Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC) |
|
|
Prefetch studies and WN changes | @Alexander Rogovskiy | Prefetch 0 now rolled out over the farm. |
|
Tokens Status | @Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis |
|
|
SKA Gateway box | @James Walder | https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180 AQ configuration for Bonded VLANs seems OK. Netbox config for {02,03} seem ok. Issues with PXE booting and OS installation due to routing between AQ and hosts. JC and JA to look into this |
|
Xrootd testing framework | @Mariam Demir |
|
|
on GGUS:
Site reports
Lancaster: Since dropping a bunch of data everything is a lot happier, although “scrubbing weirdness” is still an issue. Gerard’s on holiday so that’s the most precise we’ll get this week.
Glasgow:
Action items