HA (active/active) considerations

5 minute read

Troubleshooting HA

CloudBees CI High Availability provides tools to troubleshoot possible problems on controller replicas running in HA mode. To enable them, navigate to Manage Jenkins  CloudBees CI High Availability.

  • To enable HA Developer Mode, select Status from the left navigation pane. When enabled, a controller running in developer mode provides additional information, like the replica used by the user, or the replica executing a build.

  • Select HA Script Console to run scripts across all the controller replicas, and gather information from all of them.

Setup wizard

When a controller running in HA mode starts for the first time, one of the controller replicas acquires a lock in the shared JENKINS_HOME. This replica is the only one available, and the lock remains until the Setup wizard is ended by a user.

When the Setup wizard ends, the remaining replicas continue the startup process. During this process the remaining replicas, one by one and automatically, acquire the lock, start, and release the lock until all of them are available.

However, if the controller is created using a CasC bundle, the Setup wizard is not displayed and all the replicas automatically follow the same process described above without any human confirmation. One by one, they acquire the lock, start, and release the lock until all of them are up and running.

Nodes and agents

CloudBees CI supports a range of agent connection modes in High Availability, but each agent must have only one executor. As an agent can connect to only a single replica at a time, agents with multiple executors cannot be properly shared and are not supported by CloudBees CI High Availability (HA).

A single executor requirement is not a problem for cloud agents as they normally define only one executor. They are provisioned, use this executor, and are deprovisioned later.

Permanent or static agents, using additional JVMs, can be configured as multiple-executor agents for HA.

In the permanent agent configuration screen:

  • Set the Number of executors to 1. Otherwise, CloudBees CI would display an administrative monitor with an Apply Migration button.

    Number of executors
    Figure 1. Number of executors
  • In the Node Properties section, select the HA agent with multiple executor option and set the number of executors according to your needs.

    Number of executors for HA
    Figure 2. Number of executors for HA

The controller clones the original agent to create the executors. These read-only clones are managed by the controller itself, which adds or removes clones and reconfigures them in response to updates to the original permanent agent.

All the permanent agent clones have one executor and use different folders within the executors folder inside the original permanent agent root directory. Outbound agents automatically have the workDir settings for all the clones.

CloudBees CI displays the clones as individual one-executor agents in the Build executor status.

Agent clones providing muti-executor for HA and permanent agents
Figure 3. Agent clones providing multi-executor for HA and permanent agents

Permanent agent clones are read-only. If the Launch method selected for a permanent agent is Launch build agents via SSH (Non-blocking I/O), the Host Key Verification Strategy can not use the Manually Trusted Key Verification Strategy. This strategy depends on permissions not grated to read-only clones. If used, the connection to the permanent agent will fail.

Permanent outbound agents in HA

For outbound agents (usually SSH agents), in the configuration screen, select CloudBees High Availability in the Availability (retention strategy) field. Then the agent is kept offline until needed by some replica. CloudBees CI, when needed, brings the permanent outbound agent online temporarily.

Availability for outbound agents
Figure 4. Availability for outbound agents

Shared libraries

To use Groovy libraries, CloudBees recommends that you set them to a new “clone” mode and configure Git to use a shallow clone.

For concurrent access, if you update library checkouts in a common directory (such as $JENKINS_HOME/workspace/) or use the caching system, it can cause problems. An administrative monitor guides you to make these changes.

Non-Pipeline projects

Problems can arise in High Availability mode with non-Pipeline project types, such as Freestyle, Matrix, or Maven. When these project types are run, other replicas can load completed build metadata, but cannot take over the build successfully. As a result, if a replica terminates, any builds running on it are immediately aborted.

Horizontal scalability

The benefit of having multiple replicas should always be balanced against the associated cost according to your business case. Scaling horizontally with many replicas will have diminishing returns as the number of replicas increases.

Plugin installation and HA

Plugins can be managed and installed from the Manage Jenkins  Plugins screen. When using HA with multiple replicas, dynamic loading of plugins (plugin installation without restarting CloudBees CI) is not supported. Therefore, you must restart each replica of the controller to install or upgrade plugins.

Dynamic loading of plugins not supported
Figure 5. Dynamic loading of plugins not supported

In a CloudBees CI on modern cloud platforms with a managed controller running in HA mode, when selecting Restart Jenkins when installation is complete and no jobs are running, a rolling restart is performed, and when completed, new plugin versions are available in all replicas.

In a CloudBees CI on traditional platforms running in HA mode with multiple replicas, you must restart all controller replicas either manually or using your own automation.

When the controller is running in HA mode with only one replica, the behaviour is the same as a non-HA controller.

HA and REST-API endpoints

When running a controller in HA mode, requests to API pull-based endpoints may return information about the controller replica that responds to the API request instead of aggregated information about all the controller replicas part of the HA cluster.

Examples of these endpoints are:

  • The /metrics endpoint provided by the Metrics plugin.

  • The /monitoring endpoint provided by the Monitoring plugin.

For example, when using those plugins, if you make an HTTP API query for JVM heap usage, the returned value would only correspond to the replica that processed the request and not provide insight into other replicas. However, other information, like the number of projects, is accurate because it is automatically synchronized among all the controller replicas.

In general, responses are accurate and display aggregated replica information for:

  • Global settings.

  • List of jobs, folders, etc., and their configuration.

  • List of permanent or static agents and their configuration.

  • Set of completed builds for a given job.

However, with limited exceptions endpoints display information only about the replica responding to the requests for:

  • JVM information (current heap usage, CPU, etc.)

  • Queue items.

  • List of running builds.

  • List of ephemeral agents connected to the replica.

  • Status of static agents connected to the replica.

CloudBees CI overrides the following Jenkins core endpoints to provide aggregated information about running builds and agents:

  • The endpoint /job/xxx/api/json?tree=builds[number,building,result] returns aggregated information about running builds in all the controller replicas.

  • The endpoint /computer/api/json?tree=computer[displayName,offline] returns aggregated information about agents connected to all the controller replicas.

You can also use and configure third-party monitoring solutions like Prometheus using the CloudBees Prometheus Metrics Plugin, to provide aggregated information from all the controller replicas.

When using pull-based endpoints, whether responses provide aggregated or single-replica information depends on the implementation of the plugins and the entrypoints that provide the information. CloudBees recommends testing those pull-based entrypoints beforehand to verify which specific data is returned.

The scenario is different for push-based monitoring plugins, where data is directly sent from your CloudBees CI instance to the monitoring application. Under those circumstances, and depending on your specific requirements, the data from the various replicas can be consolidated by sending it to the same container, or not.

Stage View and Blue Ocean plugins

The Pipeline: Stage View and Blue Ocean plugins may not accurately display running builds that are owned by other replicas.

CloudBees recommends the CloudBees Pipeline Explorer plugin.