HA considerations

4 minute read

Setup wizard

When a controller running in HA mode starts for the first time, one of the controller replicas acquires a lock in the shared JENKINS_HOME. This replica is the only one available, and the lock remains until the Setup wizard is ended by a user.

When the Setup wizard ends, the remaining replicas continue the startup process. During this process the remaining replicas, one by one and automatically, acquire the lock, start, and release the lock until all of them are available.

However, if the controller is created using a CasC bundle, the Setup wizard is not displayed and all the replicas automatically follow the same process described above without any human confirmation. One by one, they acquire the lock, start, and release the lock until all of them are up and running.

Workload distribution in HA

HA distributes the different pipeline builds among the replicas, and if a replica fails, running builds continue and are adopted by another replica.

Starting with version 2.426.1.2, CloudBees CI provides explicit load balancing for controllers running in HA mode.

Explicit load balancing redirects new builds to the the controller replica with the least load.

CloudBees CI calculates the load using a simple metric that considers the following factors:

  • Running builds.

  • Already scheduled queue items.

  • Online agents.

CloudBees CI provides explicit load balancing in most cases. The table below summarizes supported and unsupported cases:

Table 1. Build Scheduling and explicit load balancing
Job type Scheduling strategy

Interactive trigger (Build Now)

Replica with the least work load

Scheduled build (Cron job)

Replica with the least work load

Branch indexing (Multibranch and Organization folder jobs)

Replica with the least work load

Webhooks (including multibranch events)

Replica with the least work load

REST API triggers

Replica with the least work load

Replica with the least work load

Always the same replica as the upstream build.

If the replica running the upstream build fails and another replica adopts the build, notifications from downstream jobs don’t reach the upstream job, which, if configured to wait until completion, keeps waiting for the downstream job to finish until finished manually.

Replica with the least work load

Any other trigger type

Same replica that processed the trigger

Plugin installation and HA

Plugins can be managed and installed from the Manage Jenkins  Plugins screen. When using HA with multiple replicas, dynamic loading of plugins (plugin installation without restarting CloudBees CI) is not supported. Therefore, you must restart each replica of the controller to install or upgrade plugins.

Dynamic loading of plugins not supported
Figure 1. Dynamic loading of plugins not supported

In a CloudBees CI on modern cloud platforms with a managed controller running in HA mode, when selecting Restart Jenkins when installation is complete and no jobs are running, a rolling restart is performed, and when completed, new plugin versions are available in all replicas.

In a CloudBees CI on traditional platforms running in HA mode with multiple replicas, you must restart all controller replicas either manually or using your own automation.

When the controller is running in HA mode with only one replica, the behaviour is the same as a non-HA controller.

HA and REST-API endpoints

When running a controller in HA mode, requests to API pull-based endpoints may return information about the controller replica that responds to the API request instead of aggregated information about all the controller replicas part of the HA cluster.

Examples of these endpoints are:

For example, when using those plugins, if you make an HTTP API query for JVM heap usage, the returned value would only correspond to the replica that processed the request and not provide insight into other replicas. However, other information, like the number of projects, is accurate because it is automatically synchronized among all the controller replicas.

In general, responses are accurate and display aggregated replica information for:

  • Global settings.

  • List of jobs, folders, etc., and their configuration.

  • List of permanent or static agents and their configuration.

  • Set of completed builds for a given job.

However, with limited exceptions endpoints display information only about the replica responding to the requests for:

  • JVM information (current heap usage, CPU, etc.)

  • Queue items.

  • List of running builds.

  • List of ephemeral agents connected to the replica.

  • Status of static agents connected to the replica.

CloudBees CI overrides the following Jenkins core endpoints to provide aggregated information about running builds and agents:

  • The endpoint /job/xxx/api/json?tree=builds[number,building,result] returns aggregated information about running builds in all the controller replicas.

  • The endpoint /computer/api/json?tree=computer[displayName,offline] returns aggregated information about agents connected to all the controller replicas.

These endpoints do not present aggregated information without the tree parameter and a list of top level fields to aggregate, such as builds[number,building,result] or computer[displayName,offline]. Omitting this parameter or using depth is not supported, and inefficient anyway.

Requests to these endpoints where the /json suffix is replaced with options like /xml or /python do not return aggregated information. Only /json returns aggregated information. These suffixes (/xml and /python), if used, only display information about builds and agents connected to the replica that processed the request.

You can also use and configure third-party monitoring solutions like Prometheus using the CloudBees Prometheus Metrics plugin, to provide aggregated information from all the controller replicas.

When using pull-based endpoints, whether responses provide aggregated or single-replica information depends on the implementation of the plugins and the entrypoints that provide the information. CloudBees recommends testing those pull-based entry points beforehand to verify which specific data is returned.

The scenario is different for push-based monitoring plugins, where data is directly sent from your CloudBees CI instance to the monitoring application. Under those circumstances, and depending on your specific requirements, the data from the various replicas can be consolidated by sending it to the same container, or not.

Build navigation gestures and builds in progress

When builds are running on a replica different from the replica holding the user session, build navigation gestures like Next Build and Previous Build skip over those builds.

When the builds end, you can navigate through them as usual.