Architecture
UI design
Mock-ups of the desired functionality can be found here. Overall, these designs incorporate two key concepts: that a dataset, containing datafiles, can be uploaded to a visit, and that datafiles can be uploaded to a dataset.
Proposed Architecture
The image below shows how the proposed system will be split into various microservices, accessible through their respective APIs.
The S3 storage can be seen as a cache until the data is moved to tape. This will need a separate piece of software to be developed, detailed here.
ICAT Entities
To track any additional uploads, a different set of database entities will need to be created based on whether the user wishes to supplement a visit with a dataset or a datafile to a dataset. The existing schema is fit for purpose and isn’t expected to change. If a user wishes to append data to their visit, a new dataset will need to be created, likewise, if a user wishes to add a datafile to a dataset, a new datafile will need to be created. Access to ICAT is achieved through its RESTful interface.
Authentication & Authorisation
The initial step involves the user accessing DataGateway, which ensures that users can log in and only access the content they are authorised to view. However, since the proposed system adopts a microservices architecture, it is crucial to implement the necessary security measures to prevent users from bypassing the user interface and directly interacting with the API. The upload back-end component is the service responsible for direct interactions with the respective databases. It verifies user authentication by retrieving the user's session based on their session ID. The ICAT API can be used to fetch the user's session details, confirming their logged-in status and the remaining session duration.
Choice of Frameworks
A meeting has been held with a developer from Zenodo to discuss how they facilitate uploads, whist they admitted their current solution was complex, he expressed a future plan to move to a protocol called tus.io.
tus.io is an open-source, cross-platform protocol designed to facilitate reliable and efficient file uploads over HTTP. It facilitates resumable file uploads, allowing large files to be uploaded in smaller chunks. The protocol provides features like pause/resume functionality, support for parallel uploads, and compatibility with various platforms and programming languages. The Tus protocol includes both client and server interfaces and is based on the core HTTP/1 and HTTP/2 protocols.
Uppy is an open-source JavaScript library that simplifies the process of handling file uploads in web applications. It provides users with a customisable user interface for selecting, uploading, and managing files from various sources, such as local devices, cloud storage services, and remote URLs. Uppy facilitates drag-and-drop functionality (Figure 4), real-time progress indicators and support for chunked uploads (Uppy, 2023). This, combined with the tus.io protocol, could provide a robust solution for the resumable upload of large files.