2024-05-23 Meeting Notes

 Date

May 23, 2024

 Participants

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @James Walder

  • @Alexander Rogovskiy

  • @Alastair Dewhurst

  • @Thomas Byrne

  • @Ian Johnson

  • Lancs: Matt, Gerard, Stephen

  • Glasgow: Same

Apologies:

 

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

@Thomas, Jyothish (STFC,RAL,SC)

Rocky 8:
1/3 alice gws
1/3 cms-aaa gws
all standard gws
gridftp gws next month

svc28-31 having network issues (ipv4 blocked)

 

CHEP Abstract ideas

@Thomas, Jyothish (STFC,RAL,SC)

(awaiting acceptance notifications)

 

XrootD Workshop plan

@Alastair Dewhurst

Registrations are open (pending prettification)

 

Rocky 8 and 9 migration planning

 

 

 

Shoveller

@Katy Ellis

configured using environment variables on a el9 machine (config file couldn’t be read)

did not receive packets from gws during testing

 

Future developments ideas planning work

@Ian Johnson @Thomas, Jyothish (STFC,RAL,SC)

 

Deletion studies through RDR

@Ian Johnson

 

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

For RADOS-level examination of async deletions; I’ve modified “clean up” code from bin/rados.cc “ObjBencher” class to read the list of objects to delete from an external file. However, my new version of the rados binary has Ceph auth problems, e.g

root@ceph-dev-mon1 ijj]# ./rados -p dteam ls -
2024-05-23 12:28:08.787 7f1c5813c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] - I’m investigating this.

 

Planning for ALICE CMSD redirection

@Thomas, Jyothish (STFC,RAL,SC)

svc19 and xrootd-dev-gw3 configured as redirector managed alice gws
240523 13:18:21 1735793 XrdAccept: Unable to accept TCP connection from localhost; permission denied
the alice manager xrootd service can’t talk with the cmsd service on the same host

 

Checksums fixes

@Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC)

WNs are currently redirecting checksums to external gws

 

Prefetch studies and WN changes

@Alexander Rogovskiy

 

 

Tokens Status

@Thomas, Jyothish (STFC,RAL,SC) @Katy Ellis

 

 

CMSD Load balancing

@Thomas Byrne @Thomas, Jyothish (STFC,RAL,SC)

PR:
https://github.com/stfc/xrootd/pull/8/files


 

 

SKA Gateway box

@James Walder

2 new servers racked up; awaiting netbox configuration.
To discuss with James A, AQ configuration

 

 

 

 

 

Xrootd testing framework

@Mariam Demir

 

 

 

on GGUS:

Site reports

Lancaster: Upgrade to Reef hasn’t had any major fall out. Problems caused by changes to how metrics are exported have been mostly fixed by Gerard and Steven. Post-upgrade still seeing Slow Ops, an investigation by Gerard to try to see if deletions could be a trigger hasn’t picked up any correlation yet.

Bonus site report: We have a zoom call with Durham this afternoon to try to get to the bottom of their weird errors.

Glasgow:

@Thomas, Jyothish (STFC,RAL,SC) and Sam to compare compiler options and flags to spot why Glasgow compilation is failing.

 

 Action items

 

  •  

  •  

 

 Decisions