Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The above nodes how this xrootd proxy issue.

last 24hrs atlas:

...

VO’s running on affected nodes:

  • LHCB (tlhcb006)

  • ATLAS (patls002)

  • NA62 (tna62a001)

  • Biomed (bio045)

  • enmr022

file read timeouts on proxy logs

Nov 16 04:20:34 lcg2631.gridpp.rl.ac.uk docker[1542921]: 231116 04:20:34 92982 XrootdAioTask: async read failed for tlhcb006.2794:170@htcjob4969334_0_slot1_246_pid3678980.ralworker; aio file read timed out /lhcb:buffer/lhcb/MC/2016/SIM/00204827/0008/00204827_00084722_1.sim
Nov 16 04:21:08 lcg2631.gridpp.rl.ac.uk docker[1542921]: 231116 04:21:08 92969 XrootdAioTask: async read failed for tlhcb006.301:77@htcjob5638675_0_slot1_14_pid2226160.ralworker; aio file read timed out /lhcb:buffer/lhcb/MC/2018/SIM/00204836/0009/00204836_00096682_1.sim

futex_wait

Nov 16 06:21:38 lcg2631.gridpp.rl.ac.uk docker[1542921]: 231116 06:21:38 96524 oss_Open_ufs: Unable to reloc FD /xcache/lhcb:buffer/lhcb/MC/2018/SIM/00204818/0008/00204818_00083792_1.sim.cinfo; invalid argument

logs cycle between authentications - no read logs happening

files in cache can be downloaded

Proxy cache discovery

Alex R noticed that cached files from the xrootd-proxy are retrieved successfully, indicating the issue exists with the proxy recalling data from the gateway.