High Availability (active/active) troubleshooting

4 minute read

If CloudBees High Availability (HA) is not working as expected, use this page to troubleshoot.

High Availability (HA) developer mode

The High Availability (HA) feature provides a developer mode to troubleshoot HA problems in controllers running in HA mode.

To enable the developer mode in a controller.

  • Navigate to Manage Jenkins  CloudBees CI High Availability.

  • Select Status on the left navigation pane. This is the default view when accessing the CloudBees CI High Availability screen.

  • Select the Enable developer mode field and select Save or Apply.

Enable developer mode
Figure 1. Enable developer mode

Developer mode is a powerful tool to troubleshoot and understand High Availability. When enabled, the controller provides additional information about the High Availability mode:

  • A button with the current replica name appears in the page footer.

  • The background color for the replica button changes from one replica to another.

  • When selected, the replica button on the footer redirects to the CloudBees CI High Availability screen.

    Footer in developer mode
    Figure 2. Footer in developer mode.
  • The consolidated queue widget displays the queue items in all the replicas.

  • In developer mode, CloudBees CI adds the replica name to those items queued in other replicas.

  • Names for items queued in the current replica remain the same.

Build name in developer mode
Figure 3. Build name in developer mode

In addition to enabling developer mode, also from the CloudBees CI High Availability screen, users can change the replica they are using by selecting the Reset sticky session button.

Switch to another replica
Figure 4. Switch to another replica

When the Reset sticky session is selected, CloudBees CI randomly assigns a new replica. If there is no change and you are assigned to the same replica, you can reload the page and try again until a new replica is assigned.

The Reset sticky session button only displays if you are using ingress-nginx. However, you may be assigned to a different replicas if, while using your browser developer tools, you remove CloudBees CI cookies and sign in again.

High Availability (HA) Script Console

The CloudBees CI High Availability screen provides a High Availability Script Console. This console allows CloudBees CI users to run scripts across all the current controller replicas and displays the results. To access the HA Script Console, select Script Console on the left.

HA Script Console
Figure 5. HA Script Console
  1. Select Script Console to access the HA Script Console.

  2. Type your scripts in the scripts area.

  3. Run your HA scripts. When selected, the script will be executed in all the controller replicas.

Results for the HA Script Console
Figure 6. Results for the HA Script Console

Include HA information in your support bundle

The CloudBees Support Plugin allows you to generate a support bundle that contains commonly requested diagnostic information used by CloudBees to resolve support issues.

This plugin is installed by default with CloudBees CI. For more information about how to generate support bundles and the information collected by CloudBees from support bundles, refer to Generating a support bundle.

To include specific High Availability (HA) information in a controller support bundle:

  1. From the root of you controller, select Support on the left navigation pane.

  2. Select the Information from other replicas option in the CloudBees Support screen. When this option is selected, the generated support bundle contains the HA information in the replicas/ directory. The replicas/ directory will include the following subdirectories:

    • The live/ directory contains a subdirectory with data for each current replica.

    • The exited/ directory contains contains a subdirectory with data for each replica previously used and replaced. For example, replicas that were replaced during rolling restarts or rolling upgrades.

  3. Select Generate Bundle to generate and download the support bundle with the HA information.

Include specific HA information in your controller support bundle
Figure 7. Include specific HA information in your controller support bundle
This option is only available for controllers that run in HA mode.

HTTP Proxy Configuration should include replicas

If configuring a proxy for your CloudBees CI installation, ensure that the proxy configuration includes all the replicas' IPs and hostnames in the No Proxy Host field. This is important because the replicas should communicate with each other directly.

Troubleshoot CloudBees CI on modern cloud platforms installations

Controller configuration is not set to Deployment

If the controller fails to provision with the error:

ERROR: Failed to provision controller ... StatefulSet is only for non-replicated controllers

The issue was caused by including kind: StatefulSet when configuring the controller under Advanced configurationYAML. To resolve the issue, select Acknowledge error, Free snapshot, then go to the configuration page and change kind: StatefulSet to kind: Deployment in the YAML field.

Troubleshoot CloudBees CI on traditional platforms installations

Replicas not forming a controller cluster

If a controller replica is not forming the cluster, it will not be displayed in the CloudBees CI High Availability screen, and the logs may contain an error similar to:

[cloudbees-replication] [5.3.6] [IP]:5701 is added to the blacklist. ... INFO c.h.internal.cluster.ClusterService: [IP]:5701 [cloudbees-replication] [5.3.6] Members {size:1, ver:1} [ Member [IP]:5701 - ID this ] ... [OperationsCenter2 connection to HOSTNAME/IP:50000] Local headers refused by remote: The controller 0-client-controller is already connected

These troubleshooting steps may help solve the issue:

  1. Review the installation steps and the required JVM arguments as explained in the Installation for CloudBees CI on traditional platforms (active/active) page.

  2. Log in to one of the replicas, and check the IP and port listed in the directory: JENKINS_HOME/cloudbees-replication-discovery/*. This directory should contain one file per replica.

  1. If the replica IP and port are what you expect, the problem is likely a network configuration problem. You must check if the replicas can communicate with each other over the IP and port from the JENKINS_HOME/cloudbees-replication-discovery/* directory.

  2. If any IP or port has an unexpected value, Hazelcast likely chose another network interface on the machine to bind to. You can configure Hazelcast to bind to a specific network interface by adding the following additional JVM arguments to each replica.

-Dhz.network.interfaces.enabled=true \ -Dhz.network.interfaces.interfaces.interface1='<IP_PATTERN>' \ (1) -Dhz.network.port.port=5701 \ -Dhz.network.port.autoincrement=true \ -Dhazelcast.jmx=false \ -Dhazelcast.metrics.jmx.enabled=false \ -Dhazelcast.health.monitoring.delay.seconds=180 \ -Dhazelcast.health.monitoring.threshold.memory.percentage=99 \ -Dhazelcast.health.monitoring.threshold.cpu.percentage=99 \
1 Replace <IP_PATTERN> with the IP pattern of the desired network interface. For example, 10.3.10.* searches for a network interface on the machine with an IP between 10.3.10.0 and 10.3.10.255.
Enable the options above on all replicas, and restart them after the change.
  1. If the issue remains unsolved, enable the Hazelcast diagnostic flag in your JVM startup argument, -Dhazelcast.diagnostics.enabled=true, and add a custom logger to the class com.cloudbees.jenkins.plugins.replication.hazelcast.FilesystemDiscoveryStrategy with the level FINE to get more information about the replica discovery process.