KBEA-00049 - Understanding the Cluster Manager cluster sharing algorithm

Article ID:360033190731
3 minute readKnowledge base

Summary

How does the ElectricAccelerator Cluster Manager cluster sharing algorithm work?

Solution

The Cluster Manager constantly manages agent allocation across the cluster for all running builds for all platforms.

The Cluster Manager allocation algorithm is responsible for collecting input and deciding how agents are assigned to builds.

The allocation algorithm’s output instructs any individual Electric Make (emake) which agents it is allowed to send jobs to.

The agent allocation algorithm behaves like a system in which the input to the algorithm determines the allocation results. Inputting the same input values to the algorithm will yield the same results every time to prevent any agent oscillation, in which the emakes are stealing the same agents over and over.

System inputs:

  1. a snapshot of all current builds'/emakes' status, including their number of requested agents, max agents, platforms, requested resources, priorities, and current agent allocation

  2. a snapshot of all agents' status, including their hostnames and whether each agent is free/assigned

  3. current agent allocation algorithm settings:

    1. Host manager type:

      • none - allocation result is independent of the resources requested from the emakes

      • ea - allocation result depends on the emake resource requests defined in an internal database table with a mapping of names and hostnames

      • grid - agents are dynamically requested through grid integration with Platform LSF, SunGrid Engine, and other grids

      • priority - agents are partitioned/grouped by department or priority so that the cluster can be fully utilized to the fullest extent possible and, also, the importance of priority is considered if there are multiple builds with different priorities or from different departments

    2. Agent allocation policy:

      • shared - an agent host with multiple agents can be shared by multiple builds

      • exclusive - an agent host with multiple agents can be assigned to one particular build only, even if the agents are not fully assigned to that build

    3. Preemption policy:

      • priority - a special check to preempt an agent from your current build to a higher priority build even if the current build will be below the min agents requirement

      • always - a locked agent can be reassigned to another higher priority build if the current build has more agents than the min agents requested

      • never - a locked agent can never be reassigned

    4. Wide/deep agent allocation policy:

      • deep - the default setting to try to group as many agents as possible in one host for one build to take advantage of the diskcache feature

      • wide - this setting attempts to make a build by using agents from different hosts

Preferences and results:

  • at equal priority, the system uses fair share to split the cluster as fair as possible among the builds

  • in exclusive mode, all agents on a given cluster host are assigned as a group, that means only one build can run on a host at one time

  • the Cluster Manager ensures that it never preempts so many agents from a build that it would drop below its min agents value (unless the preemption policy is set to priority)

  • the system ensures that a build gets agents from a matching host OS only (so Windows builds get Windows agents only)

  • the system ensures that a build that has a resource is assigned agents of only that resource

  • the system prefers to assign agents in the following order:

    • not assigned at all

    • agents from the same host (deep agent allocation policy)

    • assigned to a build, but not currently used

    • assigned to a build, and most recently assigned a job (that is, it prefers to preempt the job that has been running for the least amount of time)

Applies to

  • Product versions: 4.x and later

  • OS versions: All