Upgrade Notes
- CI Version Format Change – June 2025
-
Starting with the June 2025 release, the CloudBees CI version format will change on the fourth segment. For example, the release version 2.504.3.x will change to 2.504.3.abcd.
Feature Enhancements
- The Pod disruption budget associated with a High Availability (HA) managed controller can now be customized
-
You can now use the advanced YAML field to alter aspects of the pod disruption budget generated for a High Availability (HA) managed controller, such as the
minAvailablefield. For more information, refer to Install HA (active/active) on CloudBees CI on modern cloud platforms.
- Pages incompatible with High Availability (HA) now display a banner to indicate that the information shown might be inaccurate
-
When a controller runs in High Availability (HA) mode, some pages, such as the Build History page, a job’s Build Time Trend, and agent load statistics, do not reflect the live status of builds running on other replicas or agents connected to different replicas. This discrepancy confused users, who weren’t sure if they needed to check those pages. Now, a header appears warning about the inaccuracy, and prompts users to contact CloudBees CI support if the page is important to them. Also, a watermark is applied to these pages to help users quickly recognize they’re on a page that’s incompatible with HA.
- High Availability (HA) controller maximum surge during rolling restart
-
Previously, an High Availability (HA) controllers
Deploymentused the Kubernetes defaultmaxSurgevalue of 25%. This was not suitable because only one new replica could start at a time; other replicas would wait to acquire a startup lock. ThemaxSurgevalue is now set to 1.
The maxUnavailable value is also set to 0 by default, which prevents the removal of running replicas until replacements are ready. This setting might prevent a rolling restart in clusters with extremely constrained resources if the configured replica count is the maximum that can be scheduled. In this case, you might need to override the custom YAML file to allow a temporary scale-down:
apiVersion: apps/v1 kind: Deployment spec: strategy: rollingUpdate: maxUnavailable: 1
- Improved logging for non-concurrent builds blocking the queue
-
In scenarios where a non-concurrent build is stuck in the queue because another build is already running, the system now logs detailed information about the blocking build. The logs include the user running the blocking build and the blocking build’s build number.
- Aggregation of load statistics graphs in High Availability (HA) controllers
-
The load statistics graphs on various CloudBees CI pages (overall, per label, and per agent) now automatically display aggregated statistics in High Availability (HA) controllers. The exponential moving average data points at the three timescales sum the components (such as busy executors) from each replica.
- New support components collect information about controller startup and shutdown
-
Support bundles now include timing information about the controller startup. If the startup takes longer than 5 minutes, thread dumps are automatically collected. Similarly, if the shutdown takes longer than 15 seconds, thread dumps are also collected and attached to the support bundle.
- Administrative Monitor for plugin state changes
-
An administrative monitor has been introduced to track plugin state changes in CloudBees CI. This monitor appears across all CloudBees CI replicas and is displayed when a plugin is installed, uninstalled, enabled, disabled, updated, or installed after a restart. The monitor prompts the user to restart the controller. If the user is an administrator, a Restart Instance button is available, allowing them to restart CloudBees CI with a single click.
- Fine tune CasC Controller Bundle Service for Kubernetes setup
-
When enabling the CasC Controller Bundle Service during CloudBees CI installation, now it’s possible to fine-tune some specific Kubernetes properties:
-
imagePullSecrets
-
nodeSelector
-
tolerations
-
annotations
-
Resolved Issues
- The controller and operations center no longer fail to start when upgrading CloudBees CI
-
When upgrading or restarting CloudBees CI, the controller or operations center failed to start when a messaging database was corrupted and returned a
Messaging.afterExtensionsAugmentederror. The operations center may have also failed to start with anOperationsCenter.afterExtensionsAugmentederror.
- “Enforce Cross Site Request Forgery exploits prevention settings” option removed from Security setting enforcement
-
The Enforce Cross Site Request Forgery exploits prevention settings option has been removed from the Security Setting Enforcement section of Client controller security in the Security settings of operations center. This option was misleading because CloudBees CI has enforced Cross-Site Request Forgery (CSRF) protection (except through a system property escape hatch) since version 2.222.1 in 2020. Therefore, this operations center feature wasn’t contributing to controller security.
If this option was enabled in an operations center Configuration as Code (CasC) bundle (the crumbIssuer property of a securitySettingsEnforcement), a Configuration as Code error now occurs at startup. The same now applies to the masterKillSwitch property, which has had no effect for some time but wasn’t formally deprecated in Configuration as Code. You can temporarily downgrade this class of error to a warning using:
configuration-as-code: deprecated: warn
If the crumb issuer enforcement setting was used to propagate an option of the default issuer (namely, Enable proxy compatibility) to controllers, or to select an alternate issuer (none are supported by CloudBees), you can achieve similar behavior by using Configuration as Code bundles or other general techniques for applying configuration uniformly across controllers.
- Failure to watch progress of managed controller
-
A race condition during operations center startup could cause code which watched for Kubernetes status changes on managed controllers to be skipped, causing problems such as the hyperlink to the managed controller dashboard not being rendered even after it became ready. Now, the new logic for redirecting the proper controller URL works properly in all the cases.
- Slow startup of High Availability (HA) controllers with a large number of historical builds
-
In versions 2.414.3.7 and later, and significantly in 2.492.1.3, the startup of an High Availability (HA) controller containing jobs with a large number of historical (completed) builds could be slow. This was especially true if other running replicas were under heavy CPU load. The High Availability (HA)-specific process of loading the list of build numbers from disk was optimized to reduce overhead, which sped up rolling restarts.
- Inconsistent idle status reported for temporarily offline computers in aggregated API
-
When a static agent was marked
temporarilyOfflinewhile executing a build, its status wasn’t tracked correctly. This caused the aggregated computerset API to report inconsistent values in theidlefield, depending on which replica processed the request.
- Temporarily offline agents executing builds were missing from the computer aggregation API
-
When a node running a build was marked as
temporarilyOffline, it could be excluded from the JSON response of the computer aggregated API, depending on which replica handled the request. Now, both replicas consistently include the agent running the build in the API response.
- Standardized error display for reverse proxy in High Availability (HA) controllers
-
When a replica of a High Availability (HA) controller is asked to display information that can be rendered only by another replica (such as a running build it doesn’t manage), it acts as a reverse proxy. Under some conditions, the connection to the second replica could fail. For example, a
java.net.BindExceptioncould occur if all TCP sockets are consumed. In such cases, the stack trace was printed to the HTTP response and displayed in the browser. Now, the regular Jenkins error page is displayed. This page typically shows only a UUID of the exception, while the stack trace is printed to the system log.
- High Availability (HA) controller replicas on NFS not observing recently completed build promptly
-
As of 2.504.1.6, builds completed on High Availability (HA) controller running on specific NFS configurations should be properly loaded by other replicas. However, this process could be delayed by around 20 seconds while client caches catch up, noticeably, for example, when viewing a running build in CloudBees Pipeline Explorer via another replica and then observing its completion. Now, the reload occurs promptly by forcing a refresh of the NFS client cache.
- Fixed a race condition in non-concurrent pipelines that caused simultaneous builds in High Availability (HA) environments
-
Resolved an issue in High Availability (HA) environments where pipelines using the
disableConcurrentBuilds()option could incorrectly run multiple builds simultaneously across replicas. This fix ensures that non-concurrent pipeline settings are consistently respected.
- HA controller startup probe did not handle cold start
-
When a High Availability (HA) controller defined a startup probe, the timeout would take effect for all replicas during a cold start (scale up from zero replicas), even though only one replica could start at a time. This occurred after the configured initial delay. Now, the startup probe failure threshold at the Kubernetes level is implicitly multiplied by the configured maximum replica count, while the configured timeout is applied to each replica internally.
Note that this change means that modifying the replica or maximum replica fields may trigger a rolling restart of the controller.
- Improved reverse proxy behavior during rolling upgrade to prevent routing traffic to uninitialized CloudBees CI replicas
-
Fixed an issue where the reverse proxy could prematurely forward configuration-related requests to a new CloudBees CI replica before it was fully initialized during a rolling upgrade. As a result, users might have seen the message "Jenkins is getting ready to work." Now, the reverse proxy avoids redirecting requests to replicas that aren’t fully ready, ensuring a smoother rolling upgrade experience.
- Ensure
redirect_uriescape hatch support in the user-visible authentication method as well -
In setups where the CloudBees CI controller is configured with an internal hostname but exposed to users through a different hostname, the
redirect_uriin the single sign-on (SSO) workflow could become mismatched. This is because theredirect_uriwas computed using the internal hostname. This mismatch could cause the SSO login to fail.
Because some CloudBees CI installations require this type of different hostname configuration, we’ve provided a system property, -Dcom.cloudbees.opscenter.server.sso.SSOConfiguration.masterRootURLStrictCheckingDisabled, to allow ignoring redirect_uri hostname mismatches. This property previously had an issue, but it is now fully functional.
- Launching High Availability (HA) multi-executor agent using a secret file now works
-
Previously, auxiliary agents didn’t function correctly when the secret file was passed in the agent launching command. They only worked when the secret text was passed directly. Now, inbound agents can reliably connect using
@secret-file, regardless of whether the controller is running in High Availability (HA) mode with multiple executors.
- Gracefully handle the absence of a replica in a cluster during queue item adoption
-
Previously, when a cluster member left and no ready replica was available to adopt queue items, a
NoSuchElementExceptionwas thrown. This resulted in a long and noisy stack trace in the logs.
With this change, the system now checks for the presence of a ready replica before proceeding. If none are available, a warning is logged. This avoids the large stack trace while still notifying maintainers of the issue.
- High Availability (HA) now mutes selected logs on old replicas when classes are only available on newer replicas
-
During a rolling upgrade of a High Availability (HA) controller, some Hazelcast remote operations could fail on older replicas. Hazelcast logs this by default as SEVERE. However, this is generally a transient situation that resolves when the older replicas are shut down, and it doesn’t typically have a functional impact. The corresponding log records are now muted to avoid unnecessary log noise.
- Agent build history screen did not always render on High Availability (HA) controller
-
When using the agent build history on a High Availability (HA) controller, the dynamically populated table wasn’t rendered if the agent was connected to a replica different from the one hosting the sticky session. Now it is rendered. However, the screen isn’t handy in this case because it doesn’t support Pipeline, and non-Pipeline jobs aren’t supported on High Availability (HA) controllers. The same fix may, however, cover other miscellaneous cases such as dynamic rendering during rolling restarts, or when sticky sessions are not yet configured in an ingress controller.
- Performance regression displaying jobs/builds using Warnings Next Generation plugin (
warnings-ng) -
When a job with a large number of builds used the Warnings Next Generation plugin (
warnings-ng) (for example, with therecordIssuesPipeline step), displaying a job or build page in version 2.504.1.6 could cause all builds to load into memory under certain conditions. This degraded the controller performance. Now, at most, only a limited number of builds will be loaded.
- If
Agents.SeparateNamespace.Enabled=trueall agents are installed to the same namespace -
Setting
Agents.SeparateNamespace.Enabled=truedid not work well in conjunction with managed controllers in another namespace or cluster (Master.OperationsCenterNamespace), as the agents were all directed to a single namespace regardless of the managed controller location. Now each managed controller namespace will get its own corresponding agent namespace.
If upgrading from a prior release, you may need to set Agents.SeparateNamespace.Create=true to create the per-managed controller namespace. If some projects were relying on special service accounts (typically tied to cloud identities) in the old global namespace, those service accounts should be duplicated or moved to the new per-managed controller namespace(s). You should also delete the predefined Kubernetes shared cloud configuration snippet defined in operations center if using this system rather than Configuration as Code: https://your.domain/cjoc/view/All/job/kubernetes-shared-cloud/delete prior to letting operations center be restarted during the upgrade. Alternately, Configure it and ensure that Kubernetes Namespace is blank, as this would otherwise override the agent namespace selected by a specific managed controller.
authTokenfield andquietPeriodConfiguration as Code properties now support variable resolution-
When using Configuration as Code to configure items, earlier variable substitution wasn’t supported for either the
authTokenor thequietPeriodproperties. This change enables support for both going forward.
- Filterable roles are not loaded when using multiple Role-Based Access Control files in Configuration as Code bundles
-
When using multiple Role-Based Access Control files in a Configuration as Code bundle (e.g:
rbac-roles.yaml,rbac-groups.yaml), filterable roles are now available to set when the files are loaded.
- Kubernetes shared cloud configuration pod templates were read-only despite having
Item.CONFIGUREpermissions -
Users with
Item.CONFIGUREaccess can now add or edit pod templates for Kubernetes shared cloud configurations.
ConnectionRefusalExceptionis not handled by WebSocketAgents-
The
ConnectionRefusalExceptionexception was not cached when thrown by a WebSocket agent connection attempt. Now this problem is solved, and the exception is handled properly.
Known Issues
- Duplicate plugins in the Operations center Plugin Manager UI
-
When you search for a specific plugin under the Available tab in the Operations center Plugin Manager, the search results show duplicate entries for the plugin.
- Errors resuming declarative builds from older releases after extra restart
-
If a Declarative Pipeline build was running in version 2.492.2.3 (or earlier) and the controller was then upgraded to this release, the build would resume. However, if the controller was restarted a second time, the build would fail. This issue also impacts most running Declarative Pipelines during HA controllers rolling upgrades to this release. This issue is resolved when upgrading to 2.516.1.28662.