Issue
-
CloudBees CI controller becomes unresponsive
-
Thread dumps and slow requests reveal BLOCKED threads on
com.cloudbees.opscenter.client.cloud.PluginImpl.getSlaveManifest
. Typically::Computer.threadPoolForRemoting [#12345] java.lang.Thread.State: BLOCKED (on object monitor) at com.cloudbees.opscenter.client.cloud.PluginImpl.getSlaveManifest(PluginImpl.java:222) - waiting to lock <0x0000000123456a12> (a com.cloudbees.opscenter.client.cloud.PluginImpl)
Environment
-
CloudBees CI on modern cloud platforms - managed controller >= 2.387.2.3 and < 2.452.1.2
-
CloudBees CI on modern cloud platforms - operations center >= 2.387.2.3 and < 2.452.1.2
-
CloudBees CI on traditional platforms - client controller >= 2.387.2.3 and < 2.452.1.2
-
CloudBees CI on traditional platforms - operations center >= 2.387.2.3 and < 2.452.1.2
Related Issue(s)
-
BEE-28991: Remove usage of mapDB in Operations Center Cloud
-
BEE-48985: Un-necessary synchronization in PluginImpl.getSlaveManifest
Explanation
Starting version 2.387.2.3 of CloudBees CI, the persistence of leased shared agents manifests is not handled by a local MapDB database but by XML file, in $JENKINS_HOME/com.cloudbees.opscenter.client.cloud.XmlSlaveManifestImpl.xml
on the controller.
This change introduced a performance issue as access to the manifest is now heavily guarded by object synchronization.
The impact can be significant depending on the environment.
Mainly dependent on the usage of Shared Agents, but also underlying file system performance.
Resolution
The solution is to upgrade CloudBees CI to version 2.452.1.2 or later.
Job Config History plugin can exarcerbate the problem as it will record changes to the $JENKINS_HOME/com.cloudbees.opscenter.client.cloud.XmlSlaveManifestImpl.xml . This file must be added to exclude pattern list of the Job config History plugin configuration. See JobConfigHistory Plugin Best Practices.
|