Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Currently the farm is suffering from xrootd proxy issue, yet to be identified? Symptoms are high CLOSE_WAIT connection value and xrootd unresponsiveness. Previously it’s been identified that more than ~400 CLOSE_WAIT connections causes xrootd to act erratically.

As of 16/11/23:

  • lcg2631

  • lcg2635

  • lcg2638

  • lcg2617

The above nodes how this xrootd proxy issue.

last 24hrs atlas:

VO’s running on affected nodes:

  • LHCB (tlhcb006)

  • ATLAS (patls002)

  • NA62 (tna62a001)

  • Biomed (bio045)

  • enmr022

file read timeouts on proxy logs

Nov 16 04:20:34 lcg2631.gridpp.rl.ac.uk docker[1542921]: 231116 04:20:34 92982 XrootdAioTask: async read failed for tlhcb006.2794:170@htcjob4969334_0_slot1_246_pid3678980.ralworker; aio file read timed out /lhcb:buffer/lhcb/MC/2016/SIM/00204827/0008/00204827_00084722_1.sim
Nov 16 04:21:08 lcg2631.gridpp.rl.ac.uk docker[1542921]: 231116 04:21:08 92969 XrootdAioTask: async read failed for tlhcb006.301:77@htcjob5638675_0_slot1_14_pid2226160.ralworker; aio file read timed out /lhcb:buffer/lhcb/MC/2018/SIM/00204836/0009/00204836_00096682_1.sim

futex_wait

Nov 16 06:21:38 lcg2631.gridpp.rl.ac.uk docker[1542921]: 231116 06:21:38 96524 oss_Open_ufs: Unable to reloc FD /xcache/lhcb:buffer/lhcb/MC/2018/SIM/00204818/0008/00204818_00083792_1.sim.cinfo; invalid argument

logs cycle between authentications - no read logs happening

files in cache can be downloaded

Proxy cache discovery

Alex R noticed that cached files from the xrootd-proxy are retrieved successfully, indicating the issue exists with the proxy recalling data from the gateway.

  • No labels