Rucio at RAL
Introduction
Rucio at RAL was originally deployed to support a single VO. This VO then deployed their own instance of Rucio. Rather than tear down the Rucio instance further work was then started to expand Rucio to support Multi-VO. Multi-VO Rucio was envisioned to provide Rucio as a service to smaller VOs that may not have the manpower to deploy and subsequently maintain a Rucio instance.
This development work was supported by Digital Asset funding which enabled a lot of the core Multi-VO functionality to be developed in collaboration with the Rucio Devs. The SWIFT-HEP project has also funded development work for Rucio with the thought of High luminosity and beyond, for functionality and performance to be assessed, improved and implemented where appropriate to support the larger data load and newer technologies that are being used by sites.
EGI-ACE has also picked up Multi-VO Rucio as a service that it would like to provide for its customers as a data management solution and has therefore funded the development and running of Multi-VO Rucio as a service with the aim to have it integrate with their token authentication system EGI Check-in. This project has required much for effort than originally planned, and an entire re-work of Rucio to bring it up to date and in line with the other deployments of Rucio.
This has led to the current (as of 16th Feb) deployment of Rucio, with the Production instance being deployed on Cloud VMs, where the server and webUI are deployed. But these are stuck at Rucio version 1.23 due to python2 being deprecated and not used for a later version. Upgrades from 1.23 to later versions proved not to be simple, and the investigation of Docker and subsequently Kubernetes to deploy Rucio. The daemons are now deployed on a kubernetes cluster, and there is a pre-production testing happening on server and auth server deployed on the same server. After testing in complete this K8S deployment will become the production deployment, and the old Cloud VM deployed Rucio will be decommissioned. This decommissioning will allow for an upgrade to newer Rucio versions.
Current Purpose
Provide a Multi-VO Rucio instance for GridPP / EGI users.
Integrate with EGI Check-in for user authentication
Do development work for Swift-HEP
Development work for Multi-VO Rucio
A testbed for new users/VOs to try Rucio
Training ground for RAL staff who use Rucio without disrupting VO operations
Used to test FTS3 instances deployed at RAL
Future Needs
We will need a stable test instance that will allow scale testing of storage endpoints.
easy deployment of test instance and/or daemons with alternative database to test new features
Swift-HEP work
Integration of Rucio and DIRAC
QoS development and testing
Integration with full token workflows through IRIS-IAM or EGI Check-in
Service Requirements
We need to run functional test as well as periodic stress tests of storage endpoints.
We need reliable underlying infrastructure
We need integration with monitoring
We need it at least at a base layer managed by Aquilon
We need to take regular database backups.
We need to be able to develop code (e.g. QoS )
We need to be able to create additional daemons and servers as required.
We need Multi-VO capabilities to be able to test on other VO spaces ?
We need IPv6 if that matches other Rucio instances - ATLAS and SKA do not use IPv6 currently. SKA because it is also on the RAL Cloud which does not support IPv6, and ATLAS made the decision to not use IPv6 for now but Rucio is IPv6 ready.
We need FTS on LHCONE?
Rucio from PIP to K8S Cluster project
Scope
Resources
Timeline