Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. the exclusive WN gateways were assigned during a period where we had less and older gws (11) trying to do both FTS and WN traffic. the redirectors can balance the load but the overall load generated was too high and resulted in all gateways getting loaded.
    We now have more gateways and the total amount of gateways in use will not change. There will still be 17 gws handling both batch farm and FTS traffic

  2. Currently if any one of the WN exclusive gws goes down, it takes a quarter of the farm with it. The setup has no redundancy or failover built in.

  3. This change gets us to a stable state while the gw architecture is being decided on.
    Any further major architecture change is unlikely to take place until January, and the current setup should not stay up till then, as it heavily relies on manual intervention if anything goes wrong on WN exclusive gws.

  4. Have dedicated gateways for S3 traffic. We've had to tell S3 users to specifically point their process to a single gateway that gets a bit less traffic in order to keep things going because S3 was getting crowded out by XrootD. Cloud Team have been asking for this.

  5. No issue should arise from mixing the WN and FTS loads. If any do occur, they will have impact on decisions made for the new architecture and should be known beforehand.

Risks

  1. the current xrootd cluster capacity is not enough to handle both FTS and WN load

    1. mitigation: keep gw4-7 in the cluster, resulting in a total cluster capacity of 21 gws

    2. failover: roll back the blacklist

  2. batch farm shows errors when taken out of the sandbox

    1. mitigation: this can be done in phases, taking out one WN initially to ensure proper behaviour, then 1/2 then all of the farm

    2. failover: the workers are put back in the sandbox