If CloudBees High Availability (HA) is not working as expected, use this page to troubleshoot.
High Availability (HA) developer mode
The High Availability (HA) feature provides a developer mode to troubleshoot HA problems in controllers running in HA mode.
To enable the developer mode in a controller.
-
Navigate to
. -
Select Status on the left navigation pane. This is the default view when accessing the CloudBees CI High Availability screen.
-
Select the Enable developer mode field and select Save or Apply.
Developer mode is a powerful tool to troubleshoot and understand High Availability. When enabled, the controller provides additional information about the High Availability mode:
-
A button with the current replica name appears in the page footer.
-
The background color for the replica button changes from one replica to another.
-
When selected, the replica button on the footer redirects to the CloudBees CI High Availability screen.
Figure 2. Footer in developer mode -
The consolidated queue widget displays the queue items in all the replicas.
-
In developer mode, CloudBees CI adds the replica name to those items queued in other replicas.
-
Names for items queued in the current replica remain the same.
In addition to enabling developer mode, also from the CloudBees CI High Availability screen, users can change the replica they are using by selecting the Reset sticky session button.
When the Reset sticky session is selected, CloudBees CI randomly assigns a new replica. If there is no change, and you are assigned to the same replica, you can reload the page and try again until a new replica is assigned. The Reset sticky session button only displays if you are using ingress-nginx. However, you may be assigned to a different replicas if, while using your browser developer tools, you remove CloudBees CI cookies and sign in again. |
High Availability (HA) Script Console
The CloudBees CI High Availability screen provides a High Availability Script Console. This console allows CloudBees CI users to run scripts across all the current controller replicas and displays the results. To access the HA Script Console, select Script Console on the left.
-
Select Script Console to access the HA Script Console.
-
Type your scripts in the scripts area.
-
Run your HA scripts. When selected, the script will be executed in all the controller replicas.
Include HA information in your support bundle
The CloudBees Support plugin allows you to generate a support bundle that contains commonly requested diagnostic information used by CloudBees to resolve support issues.
This plugin is installed by default with CloudBees CI. For more information about how to generate support bundles and the information collected by CloudBees from support bundles, refer to Generating a support bundle.
To include specific High Availability (HA) information in a controller support bundle:
-
From the root of you controller, select Support on the left navigation pane.
-
Select the Information from other replicas option in the CloudBees Support screen. When this option is selected, the generated support bundle contains the HA information in the
replicas/
directory. Thereplicas/
directory will include the following subdirectories:-
The
live/
directory contains a subdirectory with data for each current replica. -
The
exited/
directory contains a subdirectory with data for each replica previously used and replaced. For example, replicas that were replaced during rolling restarts or rolling upgrades.
-
-
Select Generate Bundle to generate and download the support bundle with the HA information.
This option is only available for controllers that run in HA mode. |
HTTP Proxy Configuration should include replicas
If configuring a proxy for your CloudBees CI installation, ensure that the proxy configuration includes all the replicas' IPs and hostnames in the No Proxy Host
field.
This is important because the replicas should communicate with each other directly.
Troubleshoot CloudBees CI on modern cloud platforms installations
Controller configuration is not set to Deployment
If the controller fails to provision with the error:
ERROR: Failed to provision controller ... StatefulSet is only for non-replicated controllers
The issue was caused by including kind: StatefulSet
when configuring the controller under Advanced configuration
→ YAML
.
To resolve the issue, select Acknowledge error
, Free snapshot
, then go to the configuration page and change kind: StatefulSet
to kind: Deployment
in the YAML
field.
Troubleshoot CloudBees CI on traditional platforms installations
Replicas not forming a controller cluster
If a controller replica is not forming the cluster, it will not be displayed in the CloudBees CI High Availability screen, and the logs may contain an error similar to:
[cloudbees-replication] [5.3.6] [IP]:5701 is added to the blacklist. ... INFO c.h.internal.cluster.ClusterService: [IP]:5701 [cloudbees-replication] [5.3.6] Members {size:1, ver:1} [ Member [IP]:5701 - ID this ] ... [OperationsCenter2 connection to HOSTNAME/IP:50000] Local headers refused by remote: The controller 0-client-controller is already connected
These troubleshooting steps may help solve the issue:
-
Review the installation steps and the required JVM arguments as explained in Installation for CloudBees CI on traditional platforms (active/active).
-
Log in to one of the replicas, and check the IP and port listed in the directory:
JENKINS_HOME/cloudbees-replication-discovery/*
. This directory should contain one file per replica.-
If the replica IP and port are what you expect, the problem is likely a network configuration problem. You must check if the replicas can communicate with each other over the IP and port from the
JENKINS_HOME/cloudbees-replication-discovery/*
directory. -
If any IP or port has an unexpected value, Hazelcast likely chose another network interface on the machine to bind to. You can configure Hazelcast to bind to a specific network interface by adding the following additional JVM arguments to each replica.
-Dhz.network.interfaces.enabled=true \ -Dhz.network.interfaces.interfaces.interface1='<IP_PATTERN>' \(1) -Dhz.network.port.port=5701 \ -Dhz.network.port.autoincrement=true \ -Dhazelcast.jmx=false \ -Dhazelcast.metrics.jmx.enabled=false \ -Dhazelcast.health.monitoring.delay.seconds=180 \ -Dhazelcast.health.monitoring.threshold.memory.percentage=99 \ -Dhazelcast.health.monitoring.threshold.cpu.percentage=99 \
1 Replace <IP_PATTERN> with the IP pattern of the desired network interface. For example, 10.3.10.*
searches for a network interface on the machine with an IP between10.3.10.0
and10.3.10.255
.
Enable the options above on all replicas, and restart them after the change. -
-
If the issue remains unsolved, enable the Hazelcast diagnostic flag in your JVM startup argument,
-Dhazelcast.diagnostics.enabled=true
, and add a custom logger to the classcom.cloudbees.jenkins.plugins.replication.hazelcast.FilesystemDiscoveryStrategy
with the level FINE to get more information about the replica discovery process.