Workflows meeting with Diamond

pre-prod Date

Oct 22, 2024

Participants

@Alex Kemp
@Antony Wilson
@Kevin Phipps
Benedikt Daurer (Diamond)
Gary O'Donnell (Diamond)

Goals

To figure out how we can support Diamond

Discussion topics

Their workflow system is being built for Diamond 2, but is in a pre-prod stage now
A user will be able to select from a workflow template or upload their own (user-defined workflows)
It is built in Argo Workflows and Argo CI on top of k8s
Their workflow definition files are stored in postgres
They would like to store only raw data in the archive
Access to raw data via DGW could provide hooks into workflows used
Archiving is pushed from the diamond side so we have little control over how it gets onto it
They mentioned how difficult it was for researchers to get their data once its in the archive, and essentially became lost
THEY REALLY, REALLY WANT AN API TO BE ABLE TO GET DATA PROGRAMMATICALLY
- and put it somewhere on S3
- we all agreed that the API would need to talk in ICAT IDs but they have no way of getting these
- not sure how auth would be done, jwt, oidc, fed login?
- and how long data is kept on s3
TBD: how are they using echo/s3 at the moment
TBD: they mentioned IRIS, something about being a year away from being able to restage from s3 to IRIS? which would give us a year to get something to s3?
It was mentioned that data can be accessed in three ways:
- HTTP(zip)
- “restore to diamond cluster”(?) but these are restored with the incorrect permissions
- Globus
A User defined workflow will have an embargo on it (similar to the data) before it is made open
Workflow PIDs: suggested DOIs, but this would be unsuitable when they’re looking at running 1000s of workflows a minute.
- Maybe DOI for data collections would work?
- we’d need to progressively register workflows against data files as they happen

Action items

@Alex Kemp to follow up by email, need to have internal meeting first

Diamond Data Store Projects

Workflows meeting with Diamond

pre-prod Date

Participants

Goals

Discussion topics

Action items

Decisions