Rubin Data Manager
Project Summary:
Dates
March 2023 - September 2023
Participant
Matthew Sims
Objectives
Learn how to use Rucio as a user and a VO admin (week 1-2)
Get SLAC accounts (start week 1)
Learn how the other liaisons track work and files etc. (Week 1-4)
Understand Rucio monitoring (week 2)
Setup / Improve Rubin Rucio Monitoring (after SLAC account and understanding of Rucio)
Data movement from Cambridge to RAL Echo
Movement of DEC data from Cambridge to Echo for long term storage
Register the data with Data Butler in Edinburgh
Move data that is used often to Edinburgh
Ensure User access to data
Set up and coordinate a large-scale data movement to the UK
Track files
Troubleshoot issues
Work Done
Original Project Proposal
Group | Data Services |
Project name | Data Management for the Vera C. Rubin Observatory. |
Project LM | Timothy Noble |
Project code/Task | We will have new funding of 1.5FTE from Rubin that begins in April 2023. These budget codes are still to be created. Initially the graduate can book to STAK00024 02.03. |
Other resources | Rubin Developers and project scientists from UK and worldwide. Rucio Developers at CERN and Fermilab. Those working on the GridPP Tier-1. |
Project Summary | The Vera C. Rubin Observatory (formerly the Large Synoptic Survey Telescope (LSST)) is due to start taking data in July 2024. This data will need to be exported from Chilie to a data processing facility in SLAC and then subsequently distributed to France and the UK as part of the Data Reprocessing Pipeline (DRP), which will process the data for further analysis by astronomers.
It is currently in the process of deploying its data management infrastructure. In the UK we have already deployed 9PB of disk storage and are preparing additional Tape storage at RAL and aim to process 25% of the data produced by the observatory. It is important dataflow and analysis job workflows are established, tested, and verified beforehand.
This project is to work with the UK, US and French teams to coordinate, test and verify data distribution workflows required for a successful DRP. The data management system that is to be used in this project is Rucio, an open-source, python program that is used by ATLAS and CMS CERN experiments.
You will learn about and operate Rucio to support the data management of the project, which includes integrating storage endpoints, troubleshooting issues raised in the movement of data, and contributing to the overall operation of Rucio and curation of Rubin data.
You will also be liaising with the people around the UK supporting this project, to discuss the infrastructure that they provide for the project to ensure effective integration with Rucio.
|
Project Outputs | Project report / presentation / paper detailing the testing done and confirming that we will be able to meet the requirements for the Rubin DRP (or detailing what further work is required). Scripts / Code for managing Rubin workflows. Documentation on how the agreed workflows should be run. |
Skills and Expertise graduate will gain | · Knowledge of the use and management of Rucio, as well as the software that integrates with Rucio (e.g. FTS) · Operational experience in managing time critical experiment workflows. · Knowledge of Rubin Workflows and data management processes · Experience working as part of a large international collaboration.
|
Exit plan when Graduate moves to different project | There is funding for the Rubin data manager to be a permanent role. If there is no graduate that would like this permanently, we intend to recruit and would hope that a new person could be in place before this project finishes to handover. Documentation, plans and details of any unfinished development are to be passed on to Tim Noble (LM) and other Rubin Staff to aid in their work. |