2024-07-18 Meeting Notes

 Date

Jul 18, 2024

 Participants

 

  • @James Walder

  • @Alexander Rogovskiy

  • @Ian Johnson

  • @Mariam Demir

  • @Alastair Dewhurst

  • Lancs: @Matt Doidge Steven

  • Glasgow:

Apologies:

  • Gerard

  • @Thomas, Jyothish (STFC,RAL,SC)

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

@Thomas, Jyothish (STFC,RAL,SC)

 

‘Memory allocation errors on WN proxy containers’

External Gateways

image-20240718-115940.png
bash-5.1$ grep 'root://' errors.csv | grep -o 'GError(.[^"'"'"']*' | sort | uniq -c | sort -g -r 1028 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3008] cannot allocate memory (source)\n 265 GError('Failed to stat file (No such device) 86 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Operation expired: (source) 78 GError('Error on XrdCl::CopyProcess::Run(): [FATAL] Redirect limit has been reached: (source) 30 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3034] aio file read timed out (source)\n 23 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3010] org.dcache.uuid is no longer valid. (source)\n 20 GError('Error on XrdCl::CopyProcess::Run(): [ERROR] Local error: no space left on device: (destination)

 

CMSD Load balancing

@Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC)

PR:
https://github.com/stfc/xrootd/pull/14

 

Gateway Auth failures

@Thomas, Jyothish (STFC,RAL,SC)

 

 

XrootD Workshop plan

@Alastair Dewhurst

@Alastair Dewhurst to set up regular LOC meeting to progress agenda / registrations, etc …


 

 

Shoveler

@Katy Ellis

 

Why is this on a cloud VM?
Access required into Cern.
To confirm that test WN is able to send data to the Shoveler instance.

 

Future developments ideas planning work

@Ian Johnson @Thomas, Jyothish (STFC,RAL,SC)

 

Deletion studies through RDR

@Ian Johnson

 

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

Testing a script to run multiple RADOS operations to time concurrent deletions from multiple gateways.

 

Planning for ALICE CMSD redirection

@Thomas, Jyothish (STFC,RAL,SC)

 

 

Checksums fixes

@Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC)

 

 

Prefetch studies and WN changes

@Alexander Rogovskiy

Prefetch 0 now rolled out over the farm.

 

Tokens Status

@Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis

 

 

SKA Gateway box

@James Walder

AQ configuration for Bonded VLANs seems OK. Netbox config for {02,03} seem ok.

Issues with PXE booting and OS installation due to routing between AQ and hosts. JC and JA to look into this

 

Xrootd testing framework

@Mariam Demir

 

 

 

on GGUS:

Site reports

Lancaster: Since dropping a bunch of data everything is a lot happier, although “scrubbing weirdness” is still an issue. Gerard’s on holiday so that’s the most precise we’ll get this week.

Glasgow:

 Action items

 

  •  

  •  

 

 Decisions