2024-08-01 Meeting Notes

 Date

Aug 1, 2024

 Participants

 

  • @James Walder

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Alexander Rogovskiy

  • @Ian Johnson

  • @Mariam Demir

  • @Robert Appleyard

  • Lancs: Gerard, Steven

  • Glasgow: Sam

Apologies:

  •  

  •  

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

@Thomas, Jyothish (STFC,RAL,SC)

 

‘Memory allocation errors on WN proxy containers’

svc18 causing issues for cms SAM tests (davs refuse to connect)

memory allocation caused by 2 issues: non-pgrw vector read sizes were too large for direct reads, and the gateway containers were running out of memory.

WN container restart order can be improved

 

Compilation issues with XrdCeph and rocky 8 with 5.6+

 

Problems compiling XrdCeph alongside core xrootd (warning message → termination of compilation)

With standalone XrdCeph crashes

@James Walder and @Thomas, Jyothish (STFC,RAL,SC) to review the code and debugging.

.

CMSD Load balancing

@Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC)

PR:
https://github.com/stfc/xrootd/pull/14
Code reviewed; item to be marked as done

 

Gateway Auth failures

@Thomas, Jyothish (STFC,RAL,SC)

Auto-restarting on this failure mode is enable. Still observing occasional cases.

 

XrootD Workshop plan

@Alastair Dewhurst

 

 

Shoveler

@Katy Ellis

 

CC. for WN to report to Shoveler, not yet implemented.

 

Future developments ideas planning work

@Ian Johnson @Thomas, Jyothish (STFC,RAL,SC)

 

Deletion studies through RDR

@Ian Johnson

 

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

Preliminary figures, deleting 3000 small files from 10 gateways simultaneously, using 10 workers and 100 workers:

image-20240801-122652.png

Seems encouraging, but small files so far. Need to check results for 100 workers as some target files had already been deleted. Moving on to record timings for deleting larger files in the GB range.

 

Planning for ALICE CMSD redirection

@Thomas, Jyothish (STFC,RAL,SC)

restarted activities on this; looking at how server ‘logs in to the CMSD manager’

 

Checksums fixes

@Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC)

 

 

Prefetch studies and WN changes

@Alexander Rogovskiy

Prefetch activities complete.

 

Tokens Status

@Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis

 

 

SKA Gateway box

@James Walder

AQ configuration for Bonded VLANs seems OK. Netbox config for {02,03} seem ok.

Issues with PXE booting and OS installation due to routing between AQ and hosts. JC and JA to look into this

 

Xrootd testing framework

@Mariam Demir

 

 

 

 

on GGUS:

Site reports

Lancaster: Working on/with Reef; (@Thomas, Jyothish (STFC,RAL,SC) links to https://github.com/ceph/ceph-build/pull/2272 )

Glasgow: Man with Reef on el9

 Action items

 

  •  

  •  

 

 Decisions