Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The problem was initially found by gateway functional tests failing Friday evening.
Unexpected failures were found on the xrootd logs, such as:

  • Code Block
    CephIOAdapterRaw::read: Error in read: -16
    LoadCache Error: -16
  • Code Block
    Non expected offset: -1  8388608  41943040
    Error trying to write out of order: expeted at: 41943040 got offset8388608 of len 8388608
    XrdCephOssBufferedFile::Write: Write error  fd: 437 rc:-22 off:8388608 len:8388608
    230512 19:50:36 4111476 ofs_write: patls002.4684:11466@lcg2290 Unable to write atlas:datadisk/rucio/mc23_13p6TeV/17/17/EVNT.33427665._009684.pool.root.1; invalid argument
  • Code Block
    Error trying to write out of order: expeted at: 16777216 got offset41943040 of len 8388608
    XrdCephOssBufferedFile::Write: Write error  fd: 3494 rc:-22 off:41943040 len:8388608
    XrdCephOssBufferedFile::Close: flush Error fd: 3494 rc:-16
  • 'file already open for write' type errors

Restarting the gateways didn’t fix the issue. Memory spikes correlate to increased connections in the xrootd report monitoring shown below

...