Workflows meeting with Diamond

pre-prod Date

Oct 22, 2024

 Participants

  • @Alex Kemp

  • @Antony Wilson

  • @Kevin Phipps

  • Benedikt Daurer (Diamond)

  • Gary O'Donnell (Diamond)

 Goals

  • To figure out how we can support Diamond

 Discussion topics

  • Their workflow system is being built for Diamond 2, but is in a pre-prod stage now

  • A user will be able to select from a workflow template or upload their own (user-defined workflows)

  • It is built in Argo Workflows and Argo CI on top of k8s

  • Their workflow definition files are stored in postgres

  • They would like to store only raw data in the archive

  • Access to raw data via DGW could provide hooks into workflows used

  • Archiving is pushed from the diamond side so we have little control over how it gets onto it

  • They mentioned how difficult it was for researchers to get their data once its in the archive, and essentially became lost

  • THEY REALLY, REALLY WANT AN API TO BE ABLE TO GET DATA PROGRAMMATICALLY

    • and put it somewhere on S3

    • we all agreed that the API would need to talk in ICAT IDs but they have no way of getting these

    • not sure how auth would be done, jwt, oidc, fed login?

    • and how long data is kept on s3

  • TBD: how are they using echo/s3 at the moment

  • TBD: they mentioned IRIS, something about being a year away from being able to restage from s3 to IRIS? which would give us a year to get something to s3?

  • It was mentioned that data can be accessed in three ways:

    • HTTP(zip)

    • “restore to diamond cluster”(?) but these are restored with the incorrect permissions

    • Globus

  • A User defined workflow will have an embargo on it (similar to the data) before it is made open

  • Workflow PIDs: suggested DOIs, but this would be unsuitable when they’re looking at running 1000s of workflows a minute.

    • Maybe DOI for data collections would work?

    • we’d need to progressively register workflows against data files as they happen

 Action items

  • @Alex Kemp to follow up by email, need to have internal meeting first

 Decisions