Weighted Random Load Balancing
Introduction
This load balancing aims to provide a fast, scalable and order agnostic selection for XRootD Cluster Management System redirects.
The XRootD NodeTab (list of nodes in the cluster) is static and initialized to the maximum allowed size (STMax).
How does this work?
The basic principle of this algorithm is a random weighted selection. An easy way to picture it is as follows:
Imagine a spinning wheel divided in slices. The less loaded gateways will have a larger slice of the pie. Now throw a dart at the board. Whichever slice the dart landed on is going to be selected for the transfer.
If a node is unavailable or has issues, it has an effective slice size of 0, meaning it will never be selected.
A fuzz value provides a baseline for each slice size, providing some tuning adjustments for a more even distribution. e.g. a fuzz of 20 on the same values above will result in this wheel:
There’s still a preference for the 50% loaded node, but it’s a more even distribution.
Variables used
sp - selected node
np - current node
np->load - load reported by the node
TotWeight - the current sum of inverse load, adjusted with a fuzz factor for tuning
NodeWeight - array of total weights at the current node
Initialization
TotWeight is set to 0
NodeWeight for the current node is set to 0
sp is set to the first valid node
Looping on each node (including the first sp)
TotWeight is incremented by fuzz + (100 - np->load)
fuzz prevents inverse load being 0 for a gateway at 100 load. this provides even load balancing in cases where every node is at 100 load
higher fuzz values reduce the importance assigned to the load, helping to tune the algorithm.
NodeWeight for the current node is set to the current TotWeight.
After the loop has gone trough all available nodes
Generate a random number from 1 to the final TotWeight
Select the first node where the value in NodeWeight is greater than the random number
In the above example, the final TotWeight is 90. The random number generated is 45. since for the first node, the TotWeight was 20, it is skipped as 20<45. For the second node the TotWeight was 50 and since 50>45, this node is selected and returned.
Code
/******************************************************************************/
/* S e l b y L o a d R */
/******************************************************************************/
// Caller must have the STMutex locked. The returned node, if any, is unlocked.
XrdCmsNode *XrdCmsCluster::SelbyLoadR(SMask_t mask, XrdCmsSelector &selR)
{
static std::random_device rand_dev;
static std::default_random_engine generator(rand_dev());
XrdCmsNode *np = nullptr, *sp = nullptr;
bool reqSS = (selR.needSpace & XrdCmsNode::allowsSS) != 0;
// Scan for a node (preset possible, suspended, overloaded, full, and dead)
selR.Reset();
SelTcnt++;
int totWeight = 0;
for (int i = 0; i <= STHi; ++i) {
NodeWeight[i] = 0; // make node unselectable first
if (!((np = NodeTab[i]) && (np->NodeMask & mask)))
continue;
if (!(selR.needNet & np->hasNet)) { selR.xNoNet = true; continue; }
selR.nPick++;
if (np->isOffline) { selR.xOff = true; continue; }
if (np->isBad) { selR.xSusp = true; continue; }
if (np->myLoad > Config.MaxLoad) { selR.xOvld = true; continue; }
if (selR.needSpace) {
if (np->DiskFree < np->DiskMinF || (reqSS && np->isNoStage)) {
selR.xFull = true;
continue;
}
}
// If node passes filters, give it a weight
totWeight += Config.P_fuzz + (100 - np->myLoad);
NodeWeight[i] = totWeight;
}
std::uniform_int_distribution<int> distr(1, totWeight);
int selected = distr(generator);
for (int i = 0; i <= STHi; ++i) {
if (NodeWeight[i] < selected)
continue;
sp = NodeTab[i];
break;
}
return sp ? sp : calcDelay(selR);
}