Lack of ephemeral storage in the controller pod causes frequent instance restarts

Issue

My CloudBees CI managed controller gets restarted frequently. In the Jenkins logs I don’t see any error causing the restart, only INFO messages informing about the JVM shutdown:

...
2025-04-14 21:21:03.887+0000 [id=25]    INFO    winstone.Logger#logInternal: JVM is terminating. Shutting down Jetty
2025-04-14 21:21:03.887+0000 [id=25]    INFO    org.eclipse.jetty.server.Server#doStop: Stopped Server@1cfd1875{STOPPING}[10.0.18,sto=0]
2025-04-14 21:21:03.892+0000 [id=25]    INFO    o.e.j.server.AbstractConnector#doStop: Stopped ServerConnector@425357dd{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2025-04-14 21:21:03.895+0000 [id=25]    INFO    hudson.lifecycle.Lifecycle#onStatusUpdate: Stopping Jenkins
2025-04-14 21:21:03.896+0000 [id=25]    INFO    o.c.j.p.k.p.r.Reaper$CloudPodWatcher#stop: Stopping watch for kubernetes cloud kubernetes
2025-04-14 21:21:03.912+0000 [id=25]    INFO    jenkins.model.Jenkins$16#onAttained: Started termination
2025-04-14 21:21:03.913+0000 [id=25]    INFO    c.c.j.p.d.events.JenkinsEvents#stopSubmitter: Stopped accepting events
2025-04-14 21:21:03.913+0000 [id=25]    INFO    c.c.j.p.d.events.JenkinsEvents#stopSubmitter: Shut-down
2025-04-14 21:21:03.915+0000 [id=25]    INFO    c.c.o.c.MapDBMessagingStore#close: Messaging Stopped
2025-04-14 21:21:03.929+0000 [id=25]    INFO    jenkins.model.Jenkins$16#onAttained: Completed termination
...▼

The operations center provisioning logs show the following warnings:

[Mon Apr 14 09:37:42 UTC 2025] Connected
ManagedMaster{id=6, name='my-controller', encodedName='my-controller', idName='6-my-controller', timeStamp=0, grantId='3828e261-3391-43c1-bea4-700aaa3bcc91', approved=true, localHome='null', localEndpoint=https://cloudbees.my-company.net/my-controller/, identity=X.509, RSA}
[Mon Apr 14 09:49:14 UTC 2025][Warning][Pod][my-controller-0][Evicted] Pod ephemeral local storage usage exceeds the total limit of containers 4Gi.
[Mon Apr 14 09:49:14 UTC 2025][Normal][Pod][my-controller-0][Killing] Stopping container jenkins
ERROR: [Mon Apr 14 09:49:15 UTC 2025] Disconnected Error ManagedMaster{id=6, name='my-controller', encodedName='my-controller', idName='6-my-controller', timeStamp=0, grantId='3828e261-3391-43c1-bea4-700aaa3bcc91', approved=true, localHome='null', localEndpoint=https://cloudbees.my-company.net/my-controller/, identity=X.509, RSA}
java.nio.channels.ClosedChannelException▼

Environment

Resolution

The operations center provisioning logs point out a problem with the ephemeral storage. At the moment, we don’t have any guidelines for ephemeral-storage configuration as it depends very heavily on the number of builds, libraries and jobs that you have in a controller and can vary a lot from one controller to another. For instance, the checkout of a large repository in a pipeline can cause the controller to increase noticeably the ephemeral usage and cause a restart every time the pipeline is launched. You need to fine-tune the controller ephemeral-storage requests and limits to meet your needs.

Follow the instructions in Adding ephemeral storage requests and limits to a managed controller to customize these values for controllers from your CloudBees CI operations center.

References

Adding ephemeral storage requests and limits to a managed controller

Tested product/plugin versions

CloudBees CI on modern cloud platforms - 2.504.3.28227

This article is part of our Knowledge Base and is provided for guidance-based purposes only. The solutions or workarounds described here are not officially supported by CloudBees and may not be applicable in all environments. Use at your own discretion, and test changes in a safe environment before applying them to production systems.