Project Description

The Vera C. Rubin Observatory is due to start taking data in July 2024 for their Legacy Survey of
Space and Time (LSST). This data will be exported from Chile to a data processing facility in
SLAC (USA) and then subsequently distributed to France (IN2P3) and the UK (RAL and
Lancaster) as part of the Data Reprocessing Pipeline (DRP), which will process the data for
further analysis by astronomers. LSST is set to produce around 10TB a day (at around 2 million
files per day), 25% of that will be exported to the UK for processing. In the UK we have already
deployed 2.5PB of disk storage and are preparing to increase this by another 0.5PB per year for
the next 10 years.
The 3 sites process data and register the information in a local metadata store that also
coordinates the job flows at the site, known as Data Butler. This was built to be deployed, along
with other components on a kubernetes cluster.
You will be working to support LSST as they build their DRP, in particular this will involve
deployment of a Kubernetes cluster to support and deploy LSST software and services as
needed. What is learned from the deployment of this cluster will not only support LSST but also
directly feed into the future deployment of a kubernetes cluster to support the UK science
network.
You will work with Timothy Noble (who has deployed a small kubernetes cluster to run a data
management service Rucio) and Thomas Birkett (who has an interest in deploying services and
software in a containerised, and orchestrated manor.) to design, deploy and test a kubernetes
cluster for LSST usage.
As well as the development and deployment of a new cluster you will also be supporting the
operation of Tier-1 services, a kubernetes cluster where the Rucio data management software is
deployed, Rucio data management software, and supporting software for that service.

Graduate

Ify Agu

Supporting Staff

LM Timothy Noble

Thomas Birkett

Project Plan

Month 1

Use various learning resources to get up to speed with Linux, Docker and Kubernetes.

Using those skills deploy a single node kubernetes cluster (using containers as the nodes), and run some simple software on the cluster

Month 2-3

Using what was learned from the first month and cluster start planning and deploying a kubernetes cluster using multiple VMs.

While carrying out the deployment process, plan, document and research Kubernetes usecases, the benefits and pitfalls of using such a service.

Planning the various software that is needed on a production ready kubernetes cluster and research options to fill those needs.

Deploy a piece of software on the cluster that is useful for RAL, either as a test for its functionality (e.g. FTS), or as an investigation as to how well a service can be be moved over to the cluster, looking at the needs, workflow needed, pain points, additional support that service owners may need to convert their service.

Collect any information you have found on things you want to change into a single document for v2

Month 3-4

Research alternative cluster deployment methods, and their benefits and pitfalls. Research alternative software solutions for the requirements of the cluster to ensure using the currently understood best solution.

Refine the cluster by deploying the V2 that you have been documenting desired changes for on previous months and research of alternative deployment methods.

Investigate the use of databases on the cluster to support services, what are the advantages, disadvantages of using this method rather than a baremetal deployment.

Month 5-6

Write recommendations for current and future K8S clusters (examples would be deployment method, software, resource requirements, security considerations, load balancing, and networking (LB, ports, ingress, NodePort, BGP) to guide the departments deployments with consideration to efficiency of deployments, security, access, and software deployment.

Using all you have learned write down recommendations, advice and dos and donts for deploying and using a Kubernetes cluster.

Any recommendations write some documentation for to ensure easy use in the future.

Test any documentation, or get others to test it to ensure its clear.

Probation / APR objectives

Deployment and testing of a Kubernetes cluster deployed on a single VM

Ify is to deploy and test a Kubernetes cluster on a single VM to test the technology and give a good grounding in understanding the requirements for its deployment and use.

Ify will then use said cluster to deploy a simple software to demonstrate understanding of the deployment of software, troubleshoot issues encountered with the deployment, and gain experience.

Deployment and testing of a Kubernetes cluster deployed on VMs

Ify will look at the key factors to consider when looking at deploying on multiple cloud VMs / computers, considering security, secrets, CI/CD.

Ify will then deploy a small k8s cluster on several VMs and investigate the workflows of software deployment to deploy software the supports RAL in some way (Rucio / FTS3 / etc….)

Ify is then transition from cloudVMs to VMware to gain experience in moving clusters across computing infrastructures, what are the advantages and disadvantages of having clusters on different and varied hardware and ‘locations’.

Write policy/guidelines for deploying and using Kubernetes within SCD

Ify will write recommendations for current and future K8S clusters (examples would be deployment method, software, resource requirements, security considerations, load balancing, and networking (LB, ports, ingress, NodePort, BGP) to guide the departments deployments with consideration to efficiency of deployments, security, access, and software deployment.

Kubernetes Cluster deployment to support LSST