Operations Center Credentials Cache blocks Reading Threads

Article ID:360031137052
3 minute readKnowledge base

Issue

  • When configuring a job in a Client controller, connected to an Operations Center, if we try to set a value for credentials on any given configuration section, when we click the dropdown to open it, it takes more time than expected.

  • We see a lot of thread contention in threads related to one thread showing the trace below:

          at hudson.XmlFile.write(XmlFile.java:181)
          at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.saveCache(OperationsCenterCredentialsProvider.java:234)
          at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cachePut(OperationsCenterCredentialsProvider.java:170)
          at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.getCredentials(OperationsCenterCredentialsProvider.java:153)

    Or since version 2.107.0.5 of Operations Center Client plugin:

          at hudson.XmlFile.write(XmlFile.java:193)
          at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.lambda$saveCache$0(OperationsCenterCredentialsProvider.java:234)
          at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider$$Lambda$631/795926822.run(Unknown Source)
  • The credentials cache file $JENKINS_HOME/com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cache.xml in the controller’s file system is growing large

  • CJP-6052: Use a CJOC shared credentials cache when disconnected (introduction of the Operations Center Credentials Cache)

  • CJP-8620: Colossal Credentials Cache Causes Cataclysmic Contention (fixes thread locking and improvement of the cache that now saves entry asynchronously operations-center-client 2.107.1.5)

  • CTR-788: Slowdown when credentials cache file gets big (new implementation of the cache that avoid duplication and clean out entries periodcially operations-center-client 2.249.0.12)

Explanation

The problem comes from the performances of the serialization of the credentials cache provided by the Operations Center. The Operations Center Credentials cache is a feature meant to make the credentials that are defined in Operations Center available to the controller even in cases when the controller is disconnected from the Operations Center (ie.e Operations Center restart or simply disconnection). A common use case is the Shared Agent reconnection after a controller restart so that durable tasks (i.e. pipeline steps) could continue.

Each controller keeps a cache of the Operations Center credentials that it uses. This is persisted in a file $JENKINS_HOME/com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cache.xml in the controller’s file system.

The Operations Center cache went through a couple of improvement to make it work at scale:

  • CJP-8620: Colossal Credentials Cache Causes Cataclysmic Contention: Thread contention and synchronous save may causes severe performances problems on the controller, especially when the cache is large. This was fixed in operations-center-client 2.107.1.5)

  • CTR-788: Slowdown when credentials cache file gets big: The credentials cache can grow very large and cause severe performance problems at some extents (when the file reaches 100s of MB). This was fixed with a new implementation of the cache that avoid duplication and clean out entries periodically in operations-center-client 2.249.0.12)

Resolution

The recommended solution is to upgrade CloudBees CI to version 2.263.1.2 or later

Workaround

Due to the fact that upgrading an instance requires a lot of planning and testing, in the meantime and in order to reduce the impact of this issue the following workaround may be used:

Clearing the cache periodically

The following script can be used to reduce the probability of the issue from happening. The script flushes the cache, the performance improvement should be immediate:

import com.cloudbees.plugins.credentials.Credentials
import java.util.HashMap
import com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider

provider = ExtensionList.lookup(OperationsCenterCredentialsProvider.class).get(0)
provider.lock.writeLock().lock();
println 'Cleaning credentials cache. Lock acquired...'
try {
  provider.cache = new HashMap<String, List<? extends Credentials>>();
  provider.saveCache()
  println 'Cache cleaned'
} finally {
  provider.lock.writeLock().unlock();
  println 'Lock released'
}

This may be executed from Manage Jenkins  Script Console when the issue happens. It may well be automated and run periodically.

Disable the Cache

Another workaround is to disable the cache by adding the system property com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cache.disabled=true to your controller:

This would impact the lookup of Credentials defined in Operations Center and used by the controller. In case the Operations Center is down or not connected, the controller cannot access those credentials. This impact for example features like Shared Agents that need credentials to be defined in Operations Center.