Get ready for HA (active/active)

11 minute read

Once you understand what HA is, what it provides, its benefits and the main elements within the HA architecture, this section presents a recommended workflow for deploying an HA cluster.

CloudBees recommends the following steps to get your cluster ready for HA:

Review your network and storage configurations to meet the requirements for HA

Network configuration for HA

As described in HA fundamentals, controllers running in HA mode require controller replicas to communicate with each other. When using CloudBees CI on traditional platforms, your network configuration must open the following required ports to allow this communication:

  • Hazelcast nodes installed in the replicas. The default Hazelcast port is 5701.

  • A reverse-proxy HTTP or TCP connection to access resources from one replica that belong or are connected to another replica. Resources might include:

    • Running builds.

    • WebSocket inbound agents.

    • TCP inbound agents from outside the CloudBees CI network.

For CloudBees CI on traditional platforms, the load balancer used to distribute the traffic among the replicas must be configured to use sticky sessions. There are no additional considerations needed for CloudBees CI on modern cloud platforms.

Install a storage system that meets the requirements for HA

In CloudBees CI on modern cloud platforms, HA requires consistent shared storage between replicas. PersistentVolumeClaims (PVC) used for this purpose must accept an access mode of ReadWriteMany. The underlying storage provided depends on the Kubernetes platform used. Refer to Install HA on CloudBees CI on modern cloud platforms for more information.

In CloudBees CI on traditional platforms, as all the replicas share the same JENKINS_HOME directory. They must be able to access this shared directory, which is typically mounted on the same path for all the replicas. You must use a shared file system like NFS to store the JENKINS_HOME directory. Refer to Install HA on CloudBees CI on traditional platforms for more information.

Migrate your non-HA controller to HA

CloudBees CI controllers not running in HA mode can be migrated to HA mode. Migrated controllers must meet the specific storage requirements described before.

If the managed controller already uses a volume with ReadWriteMany capabilities, the migration process only requires switching the accessMode of the PersistentVolumeClaim (PVC) used. There is no need to move data in this scenario.

Migrating an existing managed controller to High Availability (HA) describes the process to migrate a non-HA controller to an HA controller for CloudBees CI on modern cloud platforms.

For CloudBees CI on traditional platforms, no migration is required if the client controller is already using a shared file system for JENKINS_HOME. If the client controller is not using a shared file system, and assuming that the destination volume is already mounted and is shared file system, the following steps provide a general overview of the migration process:

  1. Set the $JENKINS_HOME environment variable to the destination volume, the NFS-compatible shared filesystem used by the controller replicas when running in HA mode.

  2. Perform an initial sync between the current $JENKINS_HOME volume and the destination volume. Use, for example, rsync as described below:

    rsync -avvu --delete <old-JENKINS_HOME> <new-JENKINS_HOME>(1)
    1 Replace <old-JENKINS_HOME> with the path for the previous $JENKINS_HOME, and <new-JENKINS_HOME> with the destination path used as the new $JENKINS_HOME.
  3. Stop the non-HA controller.

  4. Perform a new sync between the current $JENKINS_HOME volume and the destination volume.

    rsync -avvu --delete <old-JENKINS_HOME> <new-JENKINS_HOME>(1)
    1 Replace <old-JENKINS_HOME> with the path for the previous $JENKINS_HOME, and <new-JENKINS_HOME> with the destination path used as the new $JENKINS_HOME.
  5. Apply the configuration change that HA requires. Refer to Install HA (active/active) on CloudBees CI on traditional platforms for more information about these configuration changes.

  6. Restart the controller, now running in HA mode.

Install the HA controllers

HA is available for CloudBees CI on modern cloud platforms and CloudBees CI on traditional platforms. Refer to Install HA (active/active) on CloudBees CI on modern cloud platforms and Install HA (active/active) on CloudBees CI on traditional platforms for more information.

High Availability (HA) on Windows controllers is not supported.

Built-in executors are not supported in HA. Controllers running in HA mode must set the number of built-in executors to 0. Keeping the number of built-in executors to 0 in all cases helps to avoid potential security issues, as CloudBees recommends in the Security recommendations.

Verify the HA cluster installation

Once you have installed an HA controller, and before configuring agents and running builds, CloudBees recommends to verify if the main elements in the HA cluster run properly.

To verify the HA cluster installation, review the following:

Review the HA cluster components

Hazelcast nodes, as described in HA fundamentals, help to create the HA cluster and synchronize the replicas. These nodes must be able to communicate with each other to ensure the desired number of replicas for the cluster.

To verify that Hazelcast nodes can create the cluster with the expected number of replicas, follow these steps:

  1. Navigate to the controller running in HA mode.

  2. Select Manage Jenkins  CloudBees CI High Availability.

  3. Review the Cluster state in the Status section and verify the cluster displays the expected number of replicas.

Two-replica HA cluster displays all the replicas
Figure 1. Two-replica HA cluster displays all the replicas
If the cluster does not display the expected number of replicas, review the network configuration to ensure the Hazelcast nodes can communicate with each other.

Check if the load balancer is configured with sticky sessions

The HA load balancer must provide sticky sessions to ensure the user session always uses the same replica.

To ensure that the load balancer uses sticky sessions, follow these steps:

  1. Enable developer mode in the controller.

    • Navigate to Manage Jenkins  CloudBees CI High Availability.

    • Select Configure in the left pane.

    • Select Enable developer mode.

    • Select Save or Apply.

  2. Verify the footer displays a banner with the replica name.

  3. Refresh the page, navigate through different items in the controller, sign out and sign in several times, and check if the banner always displays the same replica name.

If the footer banner displays different replica names, review the load balancer configuration to ensure it provides sticky sessions.
For HA clusters using ingress-nginx as the load balancer, Reset sticky session in Manage Jenkins  CloudBees CI High Availability resets the sticky session for the current user, allowing you to test the load balancer configuration again.

Test reverse-proxied requests

HA controllers load balance the workload among all the controller replicas. When builds run in one replica, and the user is browsing the user interface from another, requests to browse running builds must be reverse-proxied to the replica where the build is running. This reverse-proxied request displays in a footer banner when developer mode is enabled.

To test the reverse proxy is working correctly, follow these steps:

  1. Enable developer mode in the controller as described previously.

  2. Create a new pipeline job using the code below.

    pipeline { agent none stages { stage('Test Job') { steps { input('Waiting for interaction...') } } } }
  3. Execute the pipeline job several times by selecting Build Now. HA distributes the build among the different replicas.

  4. Verify the Builds widget adds the replica name to the jobs running in other replicas.

    Build widget display the replica name in developer mode
    Figure 2. Build widget display the replica name in developer mode
  5. In the Builds widget, select one of the builds running in a different replica.

  6. Verify that the footer banner displays the reverse-proxied request, from the replica holding the user session to the replica running the build.

    Reverse-proxied request to the replica running the build
    Figure 3. Reverse-proxied request to the replica running the build
  7. Finish all the builds before continuing.

If you cannot browse to the replica running a build, review your reverse proxy configuration to ensure it forwards the requests correctly.

Configure your agents to work with HA

CloudBees CI supports a range of agent connection modes in HA, but each agent must have only one executor. As an agent can connect to only a single replica at a time, agents with multiple executors cannot be properly shared, and are not supported by CloudBees CI High Availability (HA).

A single executor requirement is not a problem for cloud agents as they normally define only one executor. They are provisioned, use this executor, and are de-provisioned later.

Permanent agents as multiple-executor agents for HA

Agents require one executor to work with a controller running in HA mode. However, you can configure permanent or static agents to automatically generate single-executor agent clones that simulate the behaviour of a multi-executor agent. To configure a permanent agent to simulate a multi-executor agent for HA, you can use one of the following methods:

  • Automatically from the GUI.

  • Manually from the GUI.

  • Using CloudBees Configurations as Code (CasC).

Automatically from the GUI.

If the permanent agent does not have one executor, CloudBees CI displays an administrative monitor. If this administrative monitor displays, and you select Apply Migration, CloudBees CI automatically migrates the permanent agent configuration to the required configuration for HA.

Multi-executor administrative monitor
Figure 4. Multi-executor administrative monitor

Manually from the GUI

To manually configure your permanent agent to work with HA:

  1. Navigate to the agent configuration screen.

  2. Verify that the agent has only one executor.

  3. In the Node Properties section, select the HA agent with multiple executor option, and set the number of executors according to your needs.

Number of executors for HA
Figure 5. Number of executors for HA

For outbound agents (usually SSH agents), from the configuration screen, select CloudBees High Availability for the Availability (retention strategy) field. The agent is kept offline until requested by a replica. CloudBees CI, when needed, brings the permanent outbound agent online temporarily.

Availability for outbound agents
Figure 6. Availability for outbound agents

Using CloudBees Configurations as Code (CasC)

With CloudBees Configuration as Code (CasC) it is possible to configure a permanent agent as a multi-executor agent compatible with HA using code snippets similar to the examples below that display the non-HA compatible configuration and the changes requires to make it HA compatible:

Non-HA compatible multi-executor configuration
HA compatible multi-executor configuration
nodes: - permanent: ... launcher: inbound: {} name: <your-permanent-agent-name>(1) numExecutors: 3(2) ... retentionStrategy: "always"
1 Replace <your-permanent-agent-name> with the name of your permanent agent.
2 Set the number of executors according to your needs.
nodes: - permanent: ... launcher: inbound: {} name: <your-permanent-agent-name>(1) numExecutors: 1(2) nodeProperties: - cloudbeesHighAvailabilityMultipleExecutors:(3) numExecutors: 3(4) .... retentionStrategy: "always"
1 Replace <your-permanent-agent-name> with the name of your permanent agent.
2 Set the non-HA option number of executors to 1.
3 Use the cloudbeesHighAvailabilityMultipleExecutors property to define the permanent agent as a multi-executor agent for HA.
4 Set the number of HA compatible single-executor clones according to your needs.
Non-HA compatible multi-executor configuration
HA compatible multi-executor configuration
nodes: - permanent: ... launcher: nioSsh: ... name: <your-permanent-agent-name>(1) numExecutors: 3(2) ... retentionStrategy: "demand"
1 Replace <your-permanent-agent-name> with the name of your permanent agent.
2 Set the number of executors according to your needs.
nodes: - permanent: ... launcher: nioSsh: ... name: <your-permanent-agent-name>(1) numExecutors: 1(2) nodeProperties: - cloudbeesHighAvailabilityMultipleExecutors:(3) numExecutors: 3(4) .... retentionStrategy: "cloudbeesHighAvailability"(5)
1 Replace <your-permanent-agent-name> with the name of your permanent agent.
2 Set the non-HA option number of executors to 1.
3 Use the cloudbeesHighAvailabilityMultipleExecutors property to define the permanent agent as a multi-executor agent for HA.
4 Set the number of HA compatible single-executor clones according to your needs.
5 For outbound agents in HA controllers, set retentionStrategy always to cloudbeesHighAvailability.

When configured properly, the controller clones the original agent to create the executors. These read-only clones are managed by the controller itself, which adds or removes clones and reconfigures them in response to updates to the original permanent agent.

All the permanent agent clones have one executor and use different folders within the executors folder inside the original permanent agent root directory. Outbound agents automatically have the workDir settings for all the clones.

CloudBees CI displays the clones as individual one-executor agents in the Build executor status.

Agent clones providing muti-executor for HA and permanent agents
Figure 7. Agent clones providing multi-executor for HA and permanent agents

As permanent agent clones are read-only, users should avoid the Launch build agents via SSH (Non-blocking I/O) launch method. If this method is selected for a permanent agent connection, this connection fails. The Host Key Verification Strategy cannot use the Manually Trusted Key Verification Strategy as this strategy depends on permissions not granted to read-only clones.

Audit your jobs to ensure they are compatible with HA

Not every project or pipeline idiom is compatible with HA. When migrating your projects to HA, you must review the information in this section and rewrite your jobs or projects if necessary.

Pipeline projects are supported in HA mode.

Non-Pipeline projects like Freestyle, Matrix or Maven projects run in the replica where they started. Non-pipeline project builds are aborted when the replica running them ends for any reason, such as a rolling restart or upgrade, and no other replica can take over the build. Therefore, no HA is provided for those jobs.

By default, controllers set the Default Speed/Durability Level option to Maximum survivability/durability but slowest.

This option is required in HA to allow build adoption in case of a replica failure. An administrative monitor displays if you change this option at controller level or override it at the job level.

Even though HA is not provided for non-pipeline projects, High Scalability (HS) can be achieved by adding more replicas to the cluster. However, if auto-scaling is enabled for CloudBees CI on modern cloud platforms, after a period of high workload and growth in the number of replicas, when the workload goes back to normal and the number of replicas decreases as a part of a scale-down process, non-Pipeline projects running in the removed replicas are aborted and not adopted by other replicas.

The Declarative Pipeline Migration Assistant can help you migrate your Freestyle or Maven projects to Declarative Pipeline projects.

The following Pipeline steps are not currently supported in HA controllers:

  • build

  • lock

  • milestone

CloudBees CI can emulate an HA-compatible build for HA controllers following the instructions in the Emulate the build step in HA controllers section.

If you plan to use controllers running in HA mode, you can define Using Pipeline Policies. Use these to either warn developers about those incompatible steps (Warning policy), or to prevent developers from using them (Fail policy). The rule used for this kind of policy is Pipeline idioms incompatible with Cloudbees High Availability (active/active), as displayed in the image below:

Pipeline policy rule for pipeline idioms or steps incompatible with HA
Figure 8. Pipeline policy rule for pipeline idioms or steps incompatible with HA

Refer to Using Pipeline Policies for more information about how to define and apply pipeline policies to your controller.

Emulate the build step in HA controllers

The build step, provided by the pipeline-build-step plugin, does not support HA. When used, CloudBees CI does not distribute the workloads among the replicas, and always executes the job in the replica where the build step is called. If the upstream build is adopted, it may keep waiting indefinitely for the downstream build to finish.

Controllers running in HA mode can emulate the build step. This emulation provides an alias for the build step with an HA-compatible behaviour. This alias replaces all the pipeline build steps in the pipeline scripts with the build step emulation.

To enable the build step emulation in a controller:

  1. Navigate to Manage Jenkins  CloudBees High Availability

  2. Select Configure on the left.

  3. Select Emulate from Pipeline build step.

  4. Select Save or Apply.

Enable emulation for the `build` step
Figure 9. Enable emulation for the build step

If the build step emulation is enabled, administrators can uninstall the pipeline-build-step plugin, and the Pipeline idioms incompatible with CloudBees High Availability policy, if set, does not flag usages of build. The pipeline-build-step plugin also provides the waitForBuild step, which is not supported by the emulation.

Emulation does not support all the build step options:

  • Parameter types are limited to string, booleanParam, and text.

  • When using the propagate: true option (the default value), the not built and aborted statuses from the downstream job are treated as failures in the upstream job.

  • When using the wait: true option (the default value), only number, result and displayName properties are supported in the step result.

Review additional valuable information about HA

REST-API endpoints and HA

When running a controller in HA mode, requests to API pull-based endpoints may return information about the controller replica that responds to the API request instead of aggregated information about all the controller replicas part of the HA cluster. To retrieve aggregated information refer to the HA considerations page.

Blue Ocean and HA

The Blue Ocean plugin may not accurately display running builds that are owned by other replicas.

CloudBees recommends the CloudBees Pipeline Explorer plugin.

HA and costs

HA may increase the costs of running CloudBees CI as it requires additional resources, like those used to run the replicas and the required network filesystem.

Troubleshooting HA problems

Refer to High Availability (active/active) troubleshooting for information about troubleshooting problems with HA.

In addition to the general troubleshooting information, the CloudBees Knowledge Base contains articles about common issues and solutions for HA.