HA (active/active) considerations

4 minute read

Troubleshooting HA

CloudBees CI High Availability provides tools to troubleshoot possible problems on controller replicas running in HA mode. To enable them, navigate to Manage Jenkins  CloudBees CI High Availability.

  • To enable HA Developer Mode, select Status from the left navigation pane. When enabled, a controller running in developer mode provides additional information, like the replica used by the user, or the replica executing a build.

  • Select HA Script Console to run scripts across all the controller replicas, and gather information from all of them.

Setup wizard

When a controller running in HA mode starts for the first time, one of the controller replicas acquires a lock in the shared JENKINS_HOME. This replica is the only one available, and the lock remains until the Setup wizard is ended by a user.

When the Setup wizard ends, the remaining replicas continue the startup process. During this process the remaining replicas, one by one and automatically, acquire the lock, start, and release the lock until all of them are available.

However, if the controller is created using a CasC bundle, the Setup wizard is not displayed and all the replicas automatically follow the same process described above without any human confirmation. One by one, they acquire the lock, start, and release the lock until all of them are up and running.

Nodes and agents

CloudBees CI supports a range of agent connection modes in High Availability, but each agent must have only one executor. As an agent can connect to only a single replica at a time, agents with multiple executors cannot be properly shared and are not supported by CloudBees CI High Availability (HA).

A single executor requirement is not a problem for cloud agents as they normally define only one executor. They are provisioned, use this executor, and are deprovisioned later.

Permanent or static agents, using additional JVMs, can be configured as multiple-executor agents for HA.

In the permanent agent configuration screen:

  • Set the Number of executors to 1. Otherwise, CloudBees CI would display an administrative monitor with an Apply Migration button.

    Number of executors
    Figure 1. Number of executors
  • In the Node Properties section, select the HA agent with multiple executor option and set the number of executors according to your needs.

    Number of executors for HA
    Figure 2. Number of executors for HA

The controller clones the original agent to create the executors. These read-only clones are managed by the controller itself, which adds or removes clones and reconfigures them in response to updates to the original permanent agent.

All the permanent agent clones have one executor and use different folders within the executors folder inside the original permanent agent root directory. Outbound agents automatically have the workDir settings for all the clones.

CloudBees CI displays the clones as individual one-executor agents in the Build executor status.

Agent clones providing muti-executor for HA and permanent agents
Figure 3. Agent clones providing multi-executor for HA and permanent agents

Permanent outbound agents in HA

For outbound agents (usually SSH agents), in the configuration screen, select CloudBees High Availability in the Availability (retention strategy) field. Then the agent is kept offline until needed by some replica. CloudBees CI, when needed, brings the permanent outbound agent online temporarily.

Availability for outbound agents
Figure 4. Availability for outbound agents

Shared libraries

To use Groovy libraries, CloudBees recommends that you set them to a new “clone” mode and configure Git to use a shallow clone.

For concurrent access, if you update library checkouts in a common directory (such as $JENKINS_HOME/workspace/) or use the caching system, it can cause problems. An administrative monitor guides you to make these changes.

Non-Pipeline projects

Problems can arise in High Availability mode with non-Pipeline project types, such as Freestyle, Matrix, or Maven. When these project types are run, other replicas can load completed build metadata, but cannot take over the build successfully. As a result, if a replica terminates, any builds running on it are immediately aborted.

Horizontal scalability

The benefit of having multiple replicas should always be balanced against the associated cost according to your business case. Scaling horizontally with many replicas will have diminishing returns as the number of replicas increases.

Plugin installation and HA

Plugins can be managed and installed from the Manage Jenkins  Plugins screen. When using HA with multiple replicas, dynamic loading of plugins (plugin installation without restarting CloudBees CI) is not supported. Therefore, you must restart each replica of the controller to install or upgrade plugins.

Dynamic loading of plugins not supported
Figure 5. Dynamic loading of plugins not supported

In a CloudBees CI on modern cloud platforms with a managed controller running in HA mode, when selecting Restart Jenkins when installation is complete and no jobs are running, a rolling restart is performed, and when completed, new plugin versions are available in all replicas.

In a CloudBees CI on traditional platforms running in HA mode with multiple replicas, you must restart all controller replicas either manually or using your own automation.

When the controller is running in HA mode with only one replica, the behaviour is the same as a non-HA controller.

HA and REST-API endpoints

When running a controller in HA mode, any API pull-based endpoint only returns information about the controller replica that responds to the API request.

Examples of these endpoints are:

  • The /metrics endpoint of the Metrics plugin.

  • The /monitoring endpoint provided by the Monitoring plugin.

In this situation, inaccurate information may be provided in the response, depending on the metrics of requested information.

For example, if making a HTTP API query for JVM heap usage, the returned value would only correspond to the replica that processed the request, and not provide any insight into other replicas. However, other information, like the number of projects, is accurate because it is automatically synchronized among all the controller replicas.

Stage View and Blue Ocean plugins

The Pipeline: Stage View and Blue Ocean plugins may not accurately display running builds that are owned by other replicas.

CloudBees recommends the CloudBees Pipeline Explorer plugin.