2023-10-19 Meeting Notes
ย Date
Oct 19, 2023
ย Participants
@Thomas, Jyothish (STFC,RAL,SC)
Glasgow:
Lancaster:
Apologies:
@James Walder
ย
ย Goals
List of Epics
New tickets
Consider new functionality / items
Detailed discussion of important topics
Site report activity
ย
ย Discussion topics
Current status of Echo Gateways / WNs testing
Recent sandboxโs for review / deployments:
ย
Item | Presenter | Notes | ย |
---|---|---|---|
XRootD Releases | ย | 5.6.2-2 is out
Aim for prod testing next week. (aim for 1 week of testing, then deploy if ok). ย Sam notes: Lancs works (off the shelf 562-2)
| ย |
Checksums fixes | ย | ย | ย |
Prefetch studies and WN changes | Alex | (temporarily rolled back, with the ongoing work in batch farm WNs)
| ย |
Deletion studies through RDR | Ian | ย | ย |
CMSD rollout | ย | https://stfc.atlassian.net/browse/XRD-41 svc01,02,17,18 stay as internal WN gateways for now. svc19 (designated for Alice gateway) | ย |
Gateways on new network plan | ย | ipv6 sorted, firewall rules change in progress LHCONE issue sorted fermilab canโt be reached trough v6 but tracepath gets to lhcopn cern router To consider the TPC instance port | ย |
Gateways: observations | ย | WN gateways showed a spike in memory, the 2 gateways with swap enable filled in a few 100GB in swap, the other 2 crashed at the poller ย | ย |
CMSD outstanding items | ย | Icinga / nagios callout tests changes. - live and available
Improved load balancing / server failover triggering - better 'rolling server restart script' Documentation; setup / configuration / operations / troubleshooting / testing
Review of Sandbox and deployment to prod: Sandbox has been reviewed awaiting @Thomas Byrne for final confirmation | ย |
Tokens testing | ย | To Liaise with the TTT Taskforce (aka. @Matt Doidge ) no update | ย |
AAA Gateways | ย | Sandbox ready for review: http://aquilon.gridpp.rl.ac.uk/sandboxes/diff.php?sandbox=jw-xrootd-aaa-5.5.4-3 | ย |
SKA Gateway box | ย | https://stfc.atlassian.net/wiki/spaces/UK/pages/215941180 now working using ska pool on ceph dev Initial Iperf3 tests: (see table and plots below).
ย | ย |
containerised gateways (kubernetes cluster) | ย | identified an issue on workernode gateways where ceph nautilus 14.2.15 libraries were loaded (from a previous libradosstriper lockless read implementation) overriding the container installed ceph version working on ingress setup, had a 'cannot allocate port' error on setting up service (port forwarding), google suggests issue with cluster, will try rebuilding from scratch to see if fixes the issue | ย |
ย
on GGUS:
Site reports
Lancaster - moved to 5.6.2-2, all ok
Glasgow - gateways setting up, ceph disk node using lot of swap (one osd using large virtual memory ) 562-2 testing TBD later version of nautilus are more aggressive in cache, recommendation is turning swap off
ย
ย
ย Action items
ย
ย