Upgrade Notes
- Operations center CloudBees Assurance Program plugin changes since 2.516.3.29358
-
The following plugins have been added to the operations center CloudBees Assurance Program since 2.516.3.29358:
-
Common classes for CloudBees SSO Relay plugins (
cloudbees-sso-relay-common
) -
CloudBees SSO Relay for OpenID Connect (
cloudbees-sso-relay-oidc
) -
CloudBees SSO Relay for SAML (
cloudbees-sso-relay-saml
)
-
- Controller CloudBees Assurance Program plugin changes since 2.516.3.29358
-
The following plugins have been added to the controller CloudBees Assurance Program since 2.516.3.29358:
-
Common classes for CloudBees SSO Relay plugins (
cloudbees-sso-relay-common
) -
CloudBees SSO Relay for OpenID Connect (
cloudbees-sso-relay-oidc
) -
CloudBees SSO Relay for SAML (
cloudbees-sso-relay-saml
) -
MCP Server Plugin (
mcp-server
)
-
New Features
- Archive Pipeline build logs with CloudBees Pluggable Storage
-
CloudBees Pluggable Storage enables Pipeline build logs and log metadata to be stored in cloud object storage, reducing managed controller disk usage and minimizing the size of the
$JENKINS_HOME
directory. This improves backup, restore, and disaster recovery performance. CloudBees Pluggable Storage provides centralized log management and comprehensive build history retention across all managed controllers, reducing disk usage, improving operability and maintenance, and supporting the ability to leverage audit capabilities for compliance requirements. CloudBees Pluggable Storage supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, and Amazon S3-compatible providers. For more information, refer to Archive Pipeline build logs with CloudBees Pluggable Storage.
- High Availability support for single sign-on is now available
-
CloudBees CI now supports the CloudBees SSO Relay service and plugin for High Availability (HA) single sign-on (SSO) with SAML and OpenID Connect security realms. The SSO Relay enables users to authenticate directly with individual controllers when the operations center is offline, streamlining redirect URI management and improving reliability. Administrators can deploy and scale the SSO Relay independently for enhanced resilience. For more information, refer to Set up SSO Relay for CloudBees CI single sign-on.
- Improved CloudBees CI startup time
-
Beginning with CloudBees CI 2.528.1 and Folders Plus plugin (
cloudbees-folders-plus
) 3.504, the system loads jobs in parallel during startup. By default, the system uses two threads per processor (with up to 32 threads supported) to speed up startup. Administrators can adjust or disable parallel job loading by setting thecom.cloudbees.hudson.plugins.folder.ParallelChildLoader.threadCount
property to1
. For more information, refer to Parallel job loading.
Feature Enhancements
- CyberArk HTTP requests now support both HTTP/1.1 and HTTP/2 protocols
-
By default, CyberArk HTTP requests now use the HTTP/1.1 protocol. To use the HTTP/2 protocol, set the following system property to
true
:
com.cloudbees.jenkins.plugins.cyberark.credentials.CyberArkRestService.useHttp2 = true
- New Configuration as Code SCM Retriever Helm chart options
-
Support has been added for a new Helm chart option,
Casc.Retriever.extaJvmArgs
, for the Configuration as Code (CasC) Bundle Retriever. This option allows you to specify a string value to be passed to the Quarkus JVM at runtime. The specified options are appended to any other JVM options set by other mechanisms.
- New optional RBAC group added to Helm chart for use in secondary managed controller clusters
-
The
rbac.groupName
Helm value can be used to bind to the role and cluster role for controller management to a Kubernetes RBAC group. This enables integration with Amazon EKS access entries for identity-based authentication against secondary managed controller clusters.
- Managed controllers: Default JVM option
-XX:-OmitStackTraceInFastThrow
enabled -
To improve issue diagnosis, managed controllers now run with the
-XX:-OmitStackTraceInFastThrow
JVM option enabled by default. This ensures that complete stack traces are always included for common exceptions, rather than being omitted after repeated occurrences.
- More efficient routing of external WebSocket Kubernetes agents to an HA controller
-
When using a Kubernetes cloud with WebSocket transport and a defined Jenkins URL, which is typical for agents located in another cluster, an HA controller will now bypass the WebSocket reverse proxy in most cases for greater efficiency and stability.
- More efficient counting of replica load for HA controllers
-
When an HA controller accepts a build trigger or other operations, it counts the number of running builds and queue items on each replica. This operation is now optimized.
- Acquire file lock on running builds in HA controller
-
To avoid a class of problems in HA controllers involving two replicas simultaneously believing they are responsible for a given build, a file lock is now acquired when starting, resuming, or adopting a build, and only released when it is offered for adoption or completes.
- Operations center security administrative monitor for shared credentials in environments with untrusted controllers
-
CloudBees now provides a security administrative monitor that alerts administrators when shared credentials are detected in their environment. The monitor provides:
-
Administrative alerts with detailed information when shared credentials are detected in an operations center which allows untrusted controllers.
-
Visibility only to users with administrative privileges. The monitor can be dismissed if you are confident that the operations center credentials are safe to be visible from controllers with an Untrusted security mapping.
-
For more information, refer to Shared credentials administrative monitor for the operations center.
Resolved Issues
- Local items are not browsable when security mapping is restricted
-
The items browser used in Move/Copy/Promote, Remote Copy Artifact, and some other features was not behaving correctly when the security mapping for the controller is restricted. Local items where also not browsable. This fix makes local items visible regardless of the security mapping.
- HA: when triggering downstream jobs, under certain conditions the upstream job could wait forever
-
In an HA controller, when running a build that triggers other builds, an issue could occur if a downstream build completes while the upstream build is pending adoption by another replica (for example, during a rolling update). In this scenario, the event could be lost, causing the upstream build to hang indefinitely.
Now, pending messages are stored in a data structure and are periodically (every minute) retried, ensuring that adopted builds are notified as expected.
- Upstream build keeps waiting for a non-existent downstream build
-
Under certain timing conditions, an upstream build that triggered a downstream build using
triggerRemoteJob
would continue waiting for a downstream build that did not exist.
- Remote trigger: Upstream build now fails if downstream build cannot be scheduled
-
In operations center environments (including HA environments), an upstream build could hang indefinitely when a downstream build was refused scheduling (for example, when
Maximum load per replica
is reached). SubscribedRemoteTriggerModes
are now notified when a downstream build cannot be scheduled, and the upstream build fails in all modes exceptFireAndForget
.
Existing notification/callback mechanisms are reused; no new APIs were introduced. Method signatures remain unchanged. No changes are required in existing Pipelines or scripts.
NullPointerException
thrown on the Domain for managed controllers-
When the CasC bundle for a deprovisioned controller is used to provision another controller, a
NullPointerException
was incorrectly returned on the Domain field.
- Next build number not reset after job deletion and recreation in HA controller
-
In an HA controller, after running builds of a job, its next build number was retained even after the job was deleted. If a new job was created with the same name, its build numbers would start from the same point instead of resetting to
1
.
- ALB Ingress health check for the CasC SCM Retriever now points to the correct endpoint
-
The ALB Ingress health check for the CasC SCM Retriever now points to the correct CasC SCM Retriever health check endpoint, ensuring accurate health monitoring.
- Grace period for controller shutdown could be consumed by suspending Pipeline builds
-
A controller shutdown hook that suspended and saved the state of all running Pipeline builds was limited to three minutes, which is much longer than the default pod termination grace period of thirty seconds, and no individual per-build timeout was applied. In particular, in an HA managed controller, running past the deadline could cause problems because the graceful Hazelcast termination would then not be run.
- CyberArk credentials ID check failures in nested folders
-
The ID check for CyberArk username/password credentials now works as expected when the credentials are folder-scoped and configured in nested folders. Previously, the check failed with
Could not determine the credential store to validate against
, even with correct input and configuration.
- CyberArk credentials usage fails when CyberArk secret does not have a username
-
Using CyberArk username/password credentials may have failed if the associated CyberArk secret did not include a username.
- Improved resource cleanup for Pipelines using @Grab
-
When using @Grab in a Pipeline, Java classes loaded from the grabbed library were not immediately cleaned up after the Pipeline finished. This could have delayed the removal of classes associated with the Pipeline in some cases.
- Race condition when completing build in HA controllers
-
A race condition in HA controllers could have caused other replicas to believe that a build which had already completed was still owned by the replica that ran it. As a result, the build could be missing from the history widget.
- Controller offline security realm not correctly updated
-
Under certain conditions, a controller’s offline security realm was not correctly updated.
- Ungraceful shutdown of a replica could cause Hazelcast health check failure on another replica
-
When a replica shuts down ungracefully, the Hazelcast cluster waits 60 seconds before removing it from the member list. During this period, some Hazelcast operations may have waited for a response from the member that had already left, which could have caused health check failures on other replicas.
- Queue lock held when stopping Kubernetes agents
-
The Jenkins queue lock was held while waiting for the Kubernetes API server to accept a request to delete an agent pod, which could be delayed if the API server was heavily loaded (though this did not wait for the actual deletion, which could take up to 30 seconds). In systems with many running builds, this could cause significant contention on the queue, leading to unresponsiveness and, in an HA controller, liveness check failures.
- Optimization to agent disconnection during shutdown
-
A controller or HA controller replica with many agents could take too long to disconnect the agents during shutdown. This delay sometimes led to abrupt termination, especially for a managed controller.
- Hazelcast operation in HA controller locked queue during new queue item submission
-
Submitting a new queue item in HA controller previously could invoke a blocking Hazelcast operation while holding the Jenkins queue lock, potentially preventing other work from proceeding if Hazelcast was unresponsive.
- Failure to initiate clean Hazelcast termination during shutdown of loaded HA controller
-
When an HA controller was heavily loaded, a fixed-size thread pool could become busy and fail to schedule termination of Hazelcast networking. As a result, termination might not occur before the termination grace period on a managed controller expired, or before the process was halted.
- Optimized agent connect/disconnect listener in HA controllers
-
The listener that refreshes the status of
sh
steps when agents connect or disconnect in HA controllers was previously unnecessarily resource-intensive during periods of heavy build activity. This listener has been optimized to improve performance.
- Double adoption of builds in HA controller
-
Under certain timing conditions, when a replica of an HA controller crashed and other replicas eventually adopted its running builds, multiple replicas could incorrectly adopt the same build, leading to corruption.
- WebSocket connections sometimes dropped with
NullPointerException
under heavy load due to race condition -
Under heavy load, WebSocket connections could be dropped with a
NullPointerException
in Jenkins core due to a race condition in asynchronous object initialization. The issue was observed when exceeding 200 WebSocket connections. This was not fatal, as the WebSocket agent would attempt to reconnect.
- Builds orphaned by a crashed HA replica could be adopted by a replica scheduled to restart
-
When an HA controller replica exits gracefully, it offers its running builds for adoption, which are then distributed to other running replicas, starting with the least loaded and skipping older replicas during rolling upgrades. However, when a replica crashed, the code on other replicas detected the orphaned builds after a delay and adopted them on the replica noticing the status, regardless of load or restart status. This sometimes caused a replica which was about to exit to adopt a large set of builds from a recently crashed replica, instead of leaving them to a fresh replica better able to handle the load.
- Queue lock held in HA controller while recording a deleted agent
-
When an agent was deleted with Jenkins while the queue lock was held (for example, when completing a
node
block running on a cloud agent), a synchronous call to Hazelcast was made. If Hazelcast operations were impeded, this could tie up the queue lock for an extended period of time.
NullPointerException
fromQueueAdoption.adopt
in HA controllers-
Under some circumstances, potentially involving problems deserializing XML files due to missing objects, a
NullPointerException
could be returned fromQueueAdoption.adopt
in an HA controller.
- CloudBees Backup plugin: Backup permission requirements
-
After version 1160 of the CloudBees Backup plugin (
infradna-backup
), users may have experienced backup failures due to unnecessary Amazon S3 permission requirements. This issue has been resolved, and you can now use the backup functionality with the same bucket access permissions as in version 1160.
- WebSocket close handling when reason string exceeds limit
-
Closing a WebSocket connection could sometimes fail if the reason string was too long. The WebSocket limits close reason to 123 bytes, but in some cases the message being sent exceeded this limit, causing the close to fail. The close reason is now being truncated to comply with the protocol size limit.
- Unnecessary checks in High Availability (HA) controller related to
lock
step emulation when disabled -
When the
lock
step emulation was not enabled in an HA controller, the associated code still performed unnecessary processing whenever a build completed.
- Convenient way to set
-XX:MaxRAM
on managed controllers -
When using Linux kernels 6.12 and later for Kubernetes nodes, Java memory auto-detection currently does not work correctly, so it may be necessary to set the
-XX:MaxRAM
JVM option for each managed controller to match its container size. A new option in the cluster endpoint allows this to be set automatically.
- Problems scaling down multi-executor inbound agents in HA controller
-
When the executor count of a permanent inbound agent in an HA controller was reduced, including when multiple executors were disabled, the scale-down process occasionally reported failure to terminate all related Java threads, forcing a restart of the entire process.
- Excessive job reloading in HA controller when using items CasC
-
When jobs and other items on an HA controller are managed by CasC, each replica resaves every job
config.xml
during startup. This behavior sent notifications to other replicas to reload the configuration, even when there was no actual change to the item content. As a result, unnecessary work was performed and server load increased during rolling restarts.
- Double adoption of build in HA controller
-
Under heavy load conditions, a periodic process in an HA controller could have discovered a build in need of adoption (after its owning replica exited) at the same time that the adoption request was being processed. This race condition could result in two replicas both claiming the build, or one replica attempting to adopt it twice, causing metadata corruption.
- Fixed
NullPointerException
inPluginInstallAdminMonitor
-
Resolved an issue where a
NullPointerException
could occur inPluginInstallAdminMonitor
.
- Clearer message about 400 response to APIs in HA controllers
-
Some REST API endpoint usages in an HA controller are unsupported and return a
400
response with a message about the unsupported details, but previously failed to note that the limitation is specific to HA.
- Message displayed when no aborted builds are available
-
A message is now displayed on the Aborted Builds page when there are no aborted builds to list.
- Lock on build held by JUnit Attachments plugin (
junit-attachments
) when processing attachments -
When using the JUnit Attachments plugin (
junit-attachments
), a lock on a build could be held when contacting the agent to scan the workspace for attachments. This could have caused delays when saving builds and led to additional issues in HA controller environments.