XRootD: Code review for ReadV implementation in XrdCeph (March 2023)
Overview
Status | NOT STARTED |
---|---|
Submitter | @Alexander Rogovskiy |
Reviewers | @James Walder @Alexander Rogovskiy @Thomas, Jyothish (STFC,RAL,SC) @Ian Johnson @Emmanuel Bejide @Thomas Byrne @Alison Packer @Alastair Dewhurst Steven Simpson (Lancaster), sam |
Goals | Review the code for the ReadV changes and identify any further tests / updates that are required. Discuss the packaging, deployment and post-deployment testing; and included any discussion on risk factors |
Change Control | |
Jira tickets | |
|
|
On this page |
Requirements
Summary statement
Improve xrootd’s read and readv implementation for WN’s gateways to allow them effectively handle vector read operations
Problem statements
Vector read (aka readv) operations on ECHO storage are not very efficient. Oftentimes a single readv requests with cumulative size of several megabytes takes up to 1 minute or more. This causes a lot of problems, especially for LHCb VO.
Success metrics
Allow for readV access with failures at a level no worse than at other sites. Do not worsen Read performance.
Meeting Dates:
Summary of meeting times and main participants:
Meeting 1 - Tues 21 March 2023; Present:
Review Changes:
Repo | Change description | Comments | |
---|---|---|---|
1 |
|
| |
2 | The last two commits are my changes related to timeout increase | These need to go into a PR in appropriate place in STFC/xrootd repo Amended - will not apply these changes. |
Open issues
Questions that arise during the review should be largely addressed in the appropriate github PR review process. Summaries of main issues arising from the review should be noted below
Current implementation does not support a sparse object where one or more “Chunks” is full of zeros and hence not written by the (pseudo)-striper:
It is agreed that this implementation should support the existing operations.
For this to happen, the psuedo-striper will need the total file size
\decision\ it is agreed that the file size will be determined, once on open i.e. at the same time that the block size is determined. The likely place to do this is in the CephFileRef
A new PR should be created: (by the Dev team) to add and sort this new variable
It is also agreed to create a specification of the accepted read and readV behaviour in different uses cases: e.g. EOF/ sparse reads, across boundaries of sparse reads, etc. and to be demonstrated to be working via the result of the functional tests
Checksumming code should be appropriately updated too
20 April 2023
Implementation to fall back to striper read in case of sparse file implemented.
If file is corrupted can lead to corrupt data (0’s) being sent, as the striper would do
Decision made to keep this behaviour, but to log that this has occurred.
27 April 2023
Identified a case if XRootD v3 client running. This was unable to use the core XRootD server side fix to request an increased timeout
Will not apply this part of the fix, and rather use the ‘client’ side environmental variable to request the timeout value. This should be able to be applied at the docker container level of each running job (and also propagated into Singularity).
Test results
A summary of the test suites that have been run, and any issues that have arisen. New tests that are requested can be documented here. Can either be presented as a link to a test suite page or direct summary of each of the tests.
Scenario / test name | Results | What we know | What we need to learn |
---|---|---|---|
Pass | How to run the tests: |
| |
|
|
|
|
Deployment plan
Attempt to capture the steps necessary to deploy the changes into production. Noting any particular cases for side-effects to be observed and where dependencies in / from other packages / services might be relevant.
Date | Plan type | Target segment | Notes | Action items | |
---|---|---|---|---|---|
1 | TEST / FEATURE FLAG / OTHER |
| |||
2 |
|
|
|
|
|
Definitions:
ceph object / chunks (e.g. the 64MiB object)
Notes:
rop->read(0, 10 &bl1, &retval1); retval is alwalys 0. ?
Error checking for catching issues with not recieving correct bytes.
bulkAioRead: high level analogue of striper (i.e. to deal with reads at the file level).
(chunks may not always be 64MiB, e.g. 4MiB for dumps) → is tested?
lifetime is for a given readV operation.
→ Add additional comments for each method in the the .hh file
:Change of coordinates:
0 length read request ? for last object