Security fixes
- JGroups shared folder no longer world-writable by default
-
The shared folder used by JGroups to coordinate cluster state is no longer world-writable by default.
Upgrade Notes
- Controller CloudBees Assurance Program plugin changes since 2.516.3.29358
-
The following plugins have been added to the controller CloudBees Assurance Program since 2.516.3.29358:
-
MCP Server Plugin (
mcp-server
)
-
New Features
- Improved CloudBees CI startup time
-
Beginning with CloudBees CI 2.528.1 and Folders Plus plugin (
cloudbees-folders-plus
) 3.504, the system loads jobs in parallel during startup. By default, the system uses two threads per processor (with up to 32 threads supported) to speed up startup. Administrators can adjust or disable parallel job loading by setting thecom.cloudbees.hudson.plugins.folder.ParallelChildLoader.threadCount
property to1
. For more information, refer to Parallel job loading.
Feature Enhancements
- CyberArk HTTP requests now support both HTTP/1.1 and HTTP/2 protocols
-
By default, CyberArk HTTP requests now use the HTTP/1.1 protocol. To use the HTTP/2 protocol, set the following system property to
true
:
com.cloudbees.jenkins.plugins.cyberark.credentials.CyberArkRestService.useHttp2 = true
- More efficient counting of replica load for HA controllers
-
When an HA controller accepts a build trigger or other operations, it counts the number of running builds and queue items on each replica. This operation is now optimized.
- Acquire file lock on running builds in HA controller
-
To avoid a class of problems in HA controllers involving two replicas simultaneously believing they are responsible for a given build, a file lock is now acquired when starting, resuming, or adopting a build, and only released when it is offered for adoption or completes.
- Operations center security administrative monitor for shared credentials in environments with untrusted controllers
-
CloudBees now provides a security administrative monitor that alerts administrators when shared credentials are detected in their environment. The monitor provides:
-
Administrative alerts with detailed information when shared credentials are detected in an operations center which allows untrusted controllers.
-
Visibility only to users with administrative privileges. The monitor can be dismissed if you are confident that the operations center credentials are safe to be visible from controllers with an Untrusted security mapping.
-
For more information, refer to Shared credentials administrative monitor for the operations center.
Resolved Issues
- Local items are not browsable when security mapping is restricted
-
The items browser used in Move/Copy/Promote, Remote Copy Artifact, and some other features was not behaving correctly when the security mapping for the controller is restricted. Local items where also not browsable. This fix makes local items visible regardless of the security mapping.
- HA: when triggering downstream jobs, under certain conditions the upstream job could wait forever
-
In an HA controller, when running a build that triggers other builds, an issue could occur if a downstream build completes while the upstream build is pending adoption by another replica (for example, during a rolling update). In this scenario, the event could be lost, causing the upstream build to hang indefinitely.
Now, pending messages are stored in a data structure and are periodically (every minute) retried, ensuring that adopted builds are notified as expected.
- Upstream build keeps waiting for a non-existent downstream build
-
Under certain timing conditions, an upstream build that triggered a downstream build using
triggerRemoteJob
would continue waiting for a downstream build that did not exist.
- Remote trigger: Upstream build now fails if downstream build cannot be scheduled
-
In operations center environments (including HA environments), an upstream build could hang indefinitely when a downstream build was refused scheduling (for example, when
Maximum load per replica
is reached). SubscribedRemoteTriggerModes
are now notified when a downstream build cannot be scheduled, and the upstream build fails in all modes exceptFireAndForget
.
Existing notification/callback mechanisms are reused; no new APIs were introduced. Method signatures remain unchanged. No changes are required in existing Pipelines or scripts.
- Next build number not reset after job deletion and recreation in HA controller
-
In an HA controller, after running builds of a job, its next build number was retained even after the job was deleted. If a new job was created with the same name, its build numbers would start from the same point instead of resetting to
1
.
- Grace period for controller shutdown could be consumed by suspending Pipeline builds
-
A controller shutdown hook that suspended and saved the state of all running Pipeline builds was limited to three minutes, which is much longer than the default pod termination grace period of thirty seconds, and no individual per-build timeout was applied. In particular, in an HA managed controller, running past the deadline could cause problems because the graceful Hazelcast termination would then not be run.
- CyberArk credentials ID check failures in nested folders
-
The ID check for CyberArk username/password credentials now works as expected when the credentials are folder-scoped and configured in nested folders. Previously, the check failed with
Could not determine the credential store to validate against
, even with correct input and configuration.
- CyberArk credentials usage fails when CyberArk secret does not have a username
-
Using CyberArk username/password credentials may have failed if the associated CyberArk secret did not include a username.
- Improved resource cleanup for Pipelines using @Grab
-
When using @Grab in a Pipeline, Java classes loaded from the grabbed library were not immediately cleaned up after the Pipeline finished. This could have delayed the removal of classes associated with the Pipeline in some cases.
- Race condition when completing build in HA controllers
-
A race condition in HA controllers could have caused other replicas to believe that a build which had already completed was still owned by the replica that ran it. As a result, the build could be missing from the history widget.
- Controller offline security realm not correctly updated
-
Under certain conditions, a controller’s offline security realm was not correctly updated.
- Ungraceful shutdown of a replica could cause Hazelcast health check failure on another replica
-
When a replica shuts down ungracefully, the Hazelcast cluster waits 60 seconds before removing it from the member list. During this period, some Hazelcast operations may have waited for a response from the member that had already left, which could have caused health check failures on other replicas.
- Optimization to agent disconnection during shutdown
-
A controller or HA controller replica with many agents could take too long to disconnect the agents during shutdown. This delay sometimes led to abrupt termination, especially for a managed controller.
- Hazelcast operation in HA controller locked queue during new queue item submission
-
Submitting a new queue item in HA controller previously could invoke a blocking Hazelcast operation while holding the Jenkins queue lock, potentially preventing other work from proceeding if Hazelcast was unresponsive.
- Failure to initiate clean Hazelcast termination during shutdown of loaded HA controller
-
When an HA controller was heavily loaded, a fixed-size thread pool could become busy and fail to schedule termination of Hazelcast networking. As a result, termination might not occur before the termination grace period on a managed controller expired, or before the process was halted.
- Optimized agent connect/disconnect listener in HA controllers
-
The listener that refreshes the status of
sh
steps when agents connect or disconnect in HA controllers was previously unnecessarily resource-intensive during periods of heavy build activity. This listener has been optimized to improve performance.
- Double adoption of builds in HA controller
-
Under certain timing conditions, when a replica of an HA controller crashed and other replicas eventually adopted its running builds, multiple replicas could incorrectly adopt the same build, leading to corruption.
- WebSocket connections sometimes dropped with
NullPointerException
under heavy load due to race condition -
Under heavy load, WebSocket connections could be dropped with a
NullPointerException
in Jenkins core due to a race condition in asynchronous object initialization. The issue was observed when exceeding 200 WebSocket connections. This was not fatal, as the WebSocket agent would attempt to reconnect.
- Builds orphaned by a crashed HA replica could be adopted by a replica scheduled to restart
-
When an HA controller replica exits gracefully, it offers its running builds for adoption, which are then distributed to other running replicas, starting with the least loaded and skipping older replicas during rolling upgrades. However, when a replica crashed, the code on other replicas detected the orphaned builds after a delay and adopted them on the replica noticing the status, regardless of load or restart status. This sometimes caused a replica which was about to exit to adopt a large set of builds from a recently crashed replica, instead of leaving them to a fresh replica better able to handle the load.
- Queue lock held in HA controller while recording a deleted agent
-
When an agent was deleted with Jenkins while the queue lock was held (for example, when completing a
node
block running on a cloud agent), a synchronous call to Hazelcast was made. If Hazelcast operations were impeded, this could tie up the queue lock for an extended period of time.
- CloudBees Backup plugin: Backup permission requirements
-
After version 1160 of the CloudBees Backup plugin (
infradna-backup
), users may have experienced backup failures due to unnecessary Amazon S3 permission requirements. This issue has been resolved, and you can now use the backup functionality with the same bucket access permissions as in version 1160.
- WebSocket close handling when reason string exceeds limit
-
Closing a WebSocket connection could sometimes fail if the reason string was too long. The WebSocket limits close reason to 123 bytes, but in some cases the message being sent exceeded this limit, causing the close to fail. The close reason is now being truncated to comply with the protocol size limit.
- Unnecessary checks in High Availability (HA) controller related to
lock
step emulation when disabled -
When the
lock
step emulation was not enabled in an HA controller, the associated code still performed unnecessary processing whenever a build completed.
- Problems scaling down multi-executor inbound agents in HA controller
-
When the executor count of a permanent inbound agent in an HA controller was reduced, including when multiple executors were disabled, the scale-down process occasionally reported failure to terminate all related Java threads, forcing a restart of the entire process.
- Excessive job reloading in HA controller when using items CasC
-
When jobs and other items on an HA controller are managed by CasC, each replica resaves every job
config.xml
during startup. This behavior sent notifications to other replicas to reload the configuration, even when there was no actual change to the item content. As a result, unnecessary work was performed and server load increased during rolling restarts.
- Double adoption of build in HA controller
-
Under heavy load conditions, a periodic process in an HA controller could have discovered a build in need of adoption (after its owning replica exited) at the same time that the adoption request was being processed. This race condition could result in two replicas both claiming the build, or one replica attempting to adopt it twice, causing metadata corruption.
- Fixed
NullPointerException
inPluginInstallAdminMonitor
-
Resolved an issue where a
NullPointerException
could occur inPluginInstallAdminMonitor
.
- Clearer message about 400 response to APIs in HA controllers
-
Some REST API endpoint usages in an HA controller are unsupported and return a
400
response with a message about the unsupported details, but previously failed to note that the limitation is specific to HA.
- Message displayed when no aborted builds are available
-
A message is now displayed on the Aborted Builds page when there are no aborted builds to list.
- Restart build image now displays in Aborted Builds list view
-
The Restart build image was not displayed in the Aborted Builds list view.
- Lock on build held by JUnit Attachments plugin (
junit-attachments
) when processing attachments -
When using the JUnit Attachments plugin (
junit-attachments
), a lock on a build could be held when contacting the agent to scan the workspace for attachments. This could have caused delays when saving builds and led to additional issues in HA controller environments.