How To Increase CloudBees High Availability (active/passive) Timeout

Article ID:360028552531
2 minute readKnowledge base

Issue

Description

Often times a High Availability (active/passive) failover is a sign of an underlying issue that should be addressed, but this timeout can be adjusted while you are debugging the issue, or you can disable HA by following How to disable High Availability (active/passive) in Jenkins?.

Commonly, long running JVM Garbage Collection cycles that last longer than the default timeout (10s for versions lower than 2.303.2.5, 30s for version 2.303.2.5 and greater) can cause a failover. Therefore, following the Best Practices is a must.

If you are suffering HA Failover too often, we encourage you to Submit a Support Request so we can diagnose the root cause.

Resolution

If you are running product version 2.303.2.3 or higher, you can adjust the timeout from the product by going to Manage Jenkins -> Configure System -> High Availability Configuration -> Enable customize JGroups configuration. You will need to configure the ports you would like to use, as well as you can configure the timeout (default is 30 seconds).

Enable customize JGroups configuration

For product versions lower than 2.303.2.3, you can put a configurable jgroups.xml file that can live inside of ${JENKINS_HOME} By default this file is not present; if you want to customize the jgroups settings you will need to create the file. This article has reference copies of the file which you can use as a basis. Be sure to choose the file that matches your version of CloudBees CI.

The following <FD> node within jgroups.xml is what determines the timeout period before failover. It essentially works like: timeout*max_tries (+ verify_suspect). Therefore, with the default settings:

<FD timeout="3000" max_tries="3"/><VERIFY_SUSPECT timeout="1500"/>

3000*3(+1500) = ~10seconds

To increase the timeout, you can increase the values:

<FD timeout="3000" max_tries="10"/><VERIFY_SUSPECT timeout="1500"/>

3000*9(+1500) = ~30seconds

This 30 second timeout became the default timeout in relase 2.303.2.5 with change Increased High Availability (HA) default timeout (BEE-106).