Verify that both yours nodes have the following configuration:
-Dhudson.TcpSlaveAgentListener.hostName=<HOSTNAME>. Note that if the nodes and/or JNLP agents are in different Networks then the FQDN is needed (If there is not Public Domain Name, use a static IP instead).
This article describes how to collect the minimum required information for troubleshooting HA (active/passive) issues.
If the required data is bigger than 50 MB you will not be able to use ZenDesk to upload all the information. On this case we would like to encourage you to use our upload service in order to attach all the required information.
Network infrastructure description
Support bundle of the instance in HA
JGroups customization (optional)
Output from Troubleshooting application
Logs from Active and Secondary node
A brief description of your network infrastructure including:
Is there any firewall (or any other interposed network device) in the middle of both nodes?. Note: Firewalls needs a customized
jgroups.xmlsee High Availability Installations Troubleshooting
Do you have several network interfaces on those instances?
A support bundle from the Jenkins instance while the issue is exposed. Please, follow the KB below in case you don’t know how to generate a support bundle.
jgroups will pick up random ports unless we configure
If you have configure
jgroups.xml, please attach to the ticket.
To simplify the troubleshooting process of the network issues, we have published the troubleshooter program. This program runs the same lower level stack as Jenkins HA, and thus exercises the network in the exact same fashion. When you type in a text from stdin and hit enter, you should see the text echoed on all nodes of the cluster (including the node in which you typed the text.)
A good first step to diagnose the network problem is to run two instances of the troubleshooter program on the same host and see if they can communicate with each other. Then do the same on the other host. In this way, you can further isolate the problem.
Run the following command on both instances to determinate if primary and backup nodes are selected correctly. You need to go to both instances and run the following command (Please, change $JENKINS_HOME for the corresponding value):
java -DJENKINS_HOME=$JENKINS_HOME -DHA_JGROUPS_DIR=$JENKINS_HOME/jgroups/ -Djgroups.bind_addr=<IP_ADDRESS> -Djava.net.preferIPv4Stack=true -jar troubleshooter-<VERSION>-jar-with-dependencies.jar
In case the promotion process does not work correctly, i.e both nodes run as primary node, run now the troubleshooter application on logging mode to expose the problem.
java -DJENKINS_HOME=$JENKINS_HOME -DHA_JGROUPS_DIR=$JENKINS_HOME/jgroups/ -Djgroups.bind_addr=<IP_ADDRESS> -Dlogging.org.jgroups=ALL -Dlogging.com.cloudbees.jenkins.ha=ALL -Djava.net.preferIPv4Stack=true -Dha-troubleshooter.filelogging -jar troubleshooter-<VERSION>-jar-with-dependencies.jar
The output of the Troubleshooting application working with/without logging
Note about file logging
-Dha-troubleshooter.filelogging will enable file logging with log rotation.
This will by default rotate on 100 MB.
The use case is to be able to let it run in background while waiting for the issue to reoccur.
If you need to cover a bigger period of time, you may want to also use
-Dha-troubleshooter.filelogging.count=NN to raise the default value of
For example, to cover a whole week-end duration, you may want to use
-Dha-troubleshooter.filelogging.count=100 and rotate on 100 files of 10 MB, to consume a maximum of 1 GB of disk space.
When the tool starts, it will display the values for all those configuration so that you can make sure it was taken in account. Something like:
Logs File Rotation enabled: # of files: 10, max size per file: 10000000, pattern: ha-troubleshooting.abcd.%u.log
The tool generates a random four hexa digits in the file name to avoid clashing with existing one, when for example running the tool on many nodes of a HA cluster.
Provide the both, active and passive nodes, Jenkins logs i.e.
/var/log/jenkins/jenkins.log in Ubuntu/Debian. For other type of installations, get the default log locations here.