2024-11-14 Meeting Notes

 Date

Nov 14, 2024

 Participants

 

  • @Thomas, Jyothish (STFC,RAL,SC)

  • @Alastair Dewhurst

  • @Thomas Byrne

  • @Katy Ellis

  • @Alexander Rogovskiy

  • Lancs: Gerard, Matt, Steven

  • Glasgow: Sam

Apologies:

  • James

  • @Ian Johnson

CC:

 

 

 Goals

  • List of Epics

  • New tickets

  • Consider new functionality / items

  • Detailed discussion of important topics

  • Site report activity

 

 Discussion topics

Current status of Echo Gateways / WNs testing

Recent sandbox’s for review / deployments:

 

Item

Presenter

Notes

 

Item

Presenter

Notes

 

Operational Issues
Gateways and WNs:
- Current status and upcoming changes

(Gateway Auth failures)

@Thomas, Jyothish (STFC,RAL,SC)

 

accidentally broke preprod workernodes - docker had cached old layers for the new image

 

 

cms-aaa naming convention

 

cms-aaa is the only remaining personality to use proxy/ceph as the xrootd service names


Separate naming convention would be more appropriate, to have main/supporting

(not so urgent).

CC created, but due to be reviewed December

 

 

XRootD Managers De-VMWareification

@Thomas, Jyothish (STFC,RAL,SC)

Option 2 preferred for efficiency, but Option 1 decided on

Option 1 would be simpler to implement for a temporary fix, as the move would be reversed

antares tpc nodes to be moved to an echo leafsw, to confirm ipv4 real estate with James

 

Compilation and rollout status with XrdCeph and rocky 8: 5.7.x

@Thomas, Jyothish (STFC,RAL,SC)

Upstream merging in process. Branch now exists.

Documention (particularly for the Buffered IO is needed).

 

Shoveler

@Katy Ellis

Shoveler installation and monitoring

 

 

Deletion studies through RDR

@Ian Johnson

 

 

 

Deletions

https://stfc.atlassian.net/browse/XRD-83

 

 

periodic hackaton?

@Thomas, Jyothish (STFC,RAL,SC)

XRootD Writable Workernode  Gateway Hackaton (XWWGH)

Tues 12th Nov 1600
Hackaton writeable workernode

 

Xrootd testing framework

 

XRootD Site Testing Framework

 

 

100 GbE Gateway testing:
SKA / Tier-1

@James Walder

https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180

Following up on server installs
awaiting hostname, cabled but not installed

 

UKSRC Storage Architecture

 

For v0.1 Requirements:

POSIX-like access be provided ‘next to’ the compute.
Via ‘some method', files / directories are mounted (RO) for applications (eg. Jupyterhub) to read from.

POSIX area ‘should be’ an RSE, to enable the transfers, and lifecylce management.
Bulk storage “May” exist, required a TPC into the ‘cache’ area.

Dissadvantages:
- Unnecessary data movement perhaps (and via TPC)
- Mounts and permissions

Other ideas:

Manilla style shares / volumes for each DID / container requested ?
- Rucio ‘download’ rather than TPC
- Lifecycle management on the Manilla share layer?

 

Tokens Status

 

  • Operational

  • Technical

  • Accounting

 

 

 

 

 

on GGUS:

Site reports

Lancaster: Gerard updated to the latest reef (18.2.4), it wasn’t a completely smooth process but we got there in the end. Also as discussed in storage on Wednesday, turning off mclock made ceph behave a lot better.

Manchester also noticed similar problems with mclock (3 objects/hour with mclock). changing settings didn’t seem to have an effect,
lots of scrubbing but not updating the scrub date (scrubbing a bit broken on reef)

Reporting bugs flagged upstream

Glasgow: targetting pacific for upgrade, waiting for Reef stability


RAL upgrade might go to Quincy

mid-upgrade (pacific mons + nautiluse osd) - osd maps are created very quickly, but can bloat mon stores

 

 

 Action items

How to replace the original functionality of fstream monitoring, now opensearch has replaced existing solutions.

 

  •  

  •  

 

 Decisions