High Availability (active/active)

5 minute read

This guide provides an overview of the CloudBees CI High Availability feature and shows you how to install CloudBees CI High Availability.

High Availability capabilities and architecture

The CloudBees CI High Availability feature provides:

  • Controller Failover: If a controller fails, Pipeline builds normally run on that controller are automatically triggered or continued by another replica.

  • Load balancing: One logical controller can spread its workload across multiple replicas and keep them in sync. Refer to Build scheduling and explicit load balancing for more information.

  • Rolling restart with zero downtime for CloudBees CI on modern cloud platforms: If a controller replica is restarted, all the replicas keep running, and the user experiences no downtime.

  • Rolling upgrades with zero downtime for CloudBees CI on modern cloud platforms: If a managed controller running in HA mode must be updated to a new version, replicas are incrementally updated without restarting the managed controller and with zero downtime.

  • Auto-scaling for CloudBees CI on modern cloud platforms: You can set up managed controllers to increase the number of replicas, depending on the workload. They can upscale when the CPU usage overcomes a threshold and downscale when the conditions return to normal.

From a high-level perspective, these capabilities are provided using the architecture described in the image below:

CloudBees CI Level 2: High Availability (HA)

Upon completion, learners will understand how High Availability works and the benefits that this feature provides.

High Availability architecture
Figure 1. High availability architecture
  1. Controller replicas make High Availability possible.

  2. The load balancer spreads the workload between the different controller replicas. Refer to Build scheduling and explicit load balancing for more information.

  3. A shared file system to persist controller content.

  4. Hazelcast keeps the controllers’ live state in sync.

High Availability (active/active) vs. High Availability (active/passive)

Unlike the older active-passive HA system, the mode discussed here is symmetrically active-active.

In the previous High Availability (active/passive), the cluster is not a symmetric cluster where controllers share workloads together. At any given point, only one of the replicas works as a controller. When a failover occurs, one of the replicas takes over the controller role. Users will experience a downtime comparable to rebooting a Jenkins controller in a non-HA setup.

With CloudBees CI High Availability (active/active) described in this guide, all controller replicas are always working, and the controller’s workload is spread between them. When one of the replicas fails, other replicas adopt all of its builds, and the user does not have any downtime.

Horizontal auto-scaling

Once you are running controllers with multiple replicas, you can use Kubernetes Horizontal Pod Autoscaling.

The horizontal pod autoscaling controller monitors resource utilization, and adjusts the scale of its target to match your configuration settings. For example, if utilization exceeds your defined threshold, the autoscaler increases the number of replicas.

CloudBees recommends performance testing to determine appropriate thresholds that do not affect response time.

Details for the upscale events:

  • No rebalance of builds is performed.

  • Builds continue to run on existing replicas.

  • New builds are dispatched between replicas. Refer to Build scheduling and explicit load balancing for more information.

  • Due to sticky session usage, any existing session remains on the same replica.

  • New sessions are distributed randomly between replicas.

Details for downscale events:

  • Builds from removed replicas are adopted by the remaining replicas. Refer to Build scheduling and explicit load balancing for more information.

  • Web sessions associated with a removed replica are redirected to a remaining replica.

Upscaling means there are increased builds to serve that consume greater resources. The scheduling of new controller replicas are blocked when the cluster reaches capacity.

To ensure that you have enough resources, CloudBees recommends the following:

  • Use the Kubernetes cluster autoscaler, to ensure that the cluster has enough resources to accommodate the new replicas.

  • Consider the use of dedicated node pools for controllers.

  • Assign a lower priority class to agent pods, so that controller pods are scheduled first. This allows controller pods to evict agent pods, if necessary.

Install High Availability

You can install High Availability on both CloudBees CI on traditional platforms and CloudBees CI on modern cloud platforms.

In addition to other Kubernetes environments, you can install High Availability on any of the following: Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS), or Google Kubernetes Engine.

Considerations about High Availability (HA)

High Availability (HA) and Configuration as Code (CasC)

When using CasC in controllers running in HA mode, the CloudBees Configuration as Code export and Update screen may display inconsistent information about the bundle along with two buttons: Restart and Reload. This is caused by information not being properly synchronized between replicas. Furthermore, users may experience the following problems when trying to use one of the two buttons in that page.

  • Automatic reload bundle: clicking this button will show an error message.

  • Skip new bundle version: clicking this button will force a restart and the instance will not start again.

While the fix for this issue is being worked on we recommend the following if you are using CasC in controllers running HA.

  • Controllers that have configured the automatic reload. Users must disable it and configure the automatic restart instead.

  • Controllers that don’t have any automation (Bundle Update Timing). Users must stop using the Reload button and start using the Restart button instead.

Troubleshooting HA

CloudBees CI High Availability provides, in the Manage Jenkins  CloudBees CI High Availability screen, tools to troubleshoot possible problems on controller replicas running in HA mode.

  • The HA Developer Mode, that can be enabled by selecting Status on the screen left navigation pane. When enabled, a controller running in developer mode provides additional information like the replica used by the user or the replica executing a build.

  • The HA Script Console, which allows users to run scripts across all the controller replicas and gather information from all of them.

NOTE:The HA Script Console is available starting on version 2.246.3.3.

CloudBees CI High Availability screen
Figure 2. CloudBees CI High Availability screen

Nodes and agents

CloudBees supports a range of agent connection modes in High Availability, but each agent must have only one executor. As an agent can connect to only a single replica at a time, agents with multiple executors cannot be properly shared and are not supported by CloudBees CI High Availability.

You can share a high-capacity computer among several concurrent builds, if desired, by connecting multiple agents to the replicated controller (ensure that you use a unique remote root directory so each agent has its own workspace).

Shared libraries

To use Groovy libraries, CloudBees recommends that you set them to a new “clone” mode and configure Git to use a shallow clone.

For concurrent access, if you update library checkouts in a common directory (such as $JENKINS_HOME/workspace/) or use the caching system, it can cause problems. An administrative monitor guides you to make these changes.

Non-Pipeline projects

Problems can arise in High Availability mode with non-Pipeline project types, such as freestyle, matrix, or Maven. When these project types are run, other replicas can load completed build metadata, but cannot take over the build successfully. As a result, if a replica terminates, any builds running on it are immediately aborted.

Horizontal scalability

The benefit of having multiple replicas should always be balanced against the associated cost according to your business case. Scaling horizontally with many replicas will have diminishing returns as the number of replicas increases.

Plugin installation and HA

Plugins can be managed and installed from the Manage Jenkins  Plugins screen. When you use the UI to install a plugin, you can choose to complete the plugin installation without restarting the controller. In that situation, if the controller is running in HA mode, the plugin is only loaded in the replica used for the installation process. You must restart the controller to make the plugin available to all of the replicas.

This does not happen if you select the option to install the plugin after a restart or if you perform a plugin upgrade, that always requires a restart. In those cases, after the restart, the plugin is loaded in all of the controller replicas.