Workflows meeting with Diamond
pre-prod Date
Oct 22, 2024
Participants
@Alex Kemp
@Antony Wilson
@Kevin Phipps
Benedikt Daurer (Diamond)
Gary O'Donnell (Diamond)
Goals
To figure out how we can support Diamond
Discussion topics
Their workflow system is being built for Diamond 2, but is in a pre-prod stage now
A user will be able to select from a workflow template or upload their own (user-defined workflows)
It is built in Argo Workflows and Argo CI on top of k8s
Their workflow definition files are stored in postgres
They would like to store only raw data in the archive
Access to raw data via DGW could provide hooks into workflows used
Archiving is pushed from the diamond side so we have little control over how it gets onto it
They mentioned how difficult it was for researchers to get their data once its in the archive, and essentially became lost
THEY REALLY, REALLY WANT AN API TO BE ABLE TO GET DATA PROGRAMMATICALLY
and put it somewhere on S3
we all agreed that the API would need to talk in ICAT IDs but they have no way of getting these
not sure how auth would be done, jwt, oidc, fed login?
and how long data is kept on s3
TBD: how are they using echo/s3 at the moment
TBD: they mentioned IRIS, something about being a year away from being able to restage from s3 to IRIS? which would give us a year to get something to s3?
It was mentioned that data can be accessed in three ways:
HTTP(zip)
“restore to diamond cluster”(?) but these are restored with the incorrect permissions
Globus
A User defined workflow will have an embargo on it (similar to the data) before it is made open
Workflow PIDs: suggested DOIs, but this would be unsuitable when they’re looking at running 1000s of workflows a minute.
Maybe DOI for data collections would work?
we’d need to progressively register workflows against data files as they happen
Action items
@Alex Kemp to follow up by email, need to have internal meeting first