Issue
You observe that your controller is crashing due to OutOfMemory
errors. After getting a Heap dump for the instance, you see a lot of instances loaded for pipeline steps:
XXXXXX instances of "org.jenkinsci.plugins.workflow.cps.nodes.StepEndNode", loaded by "hudson.ClassicPluginStrategy$AntClassLoader2 @ 0xxxxxxx"
But the most interesting thing is that you see a high amount of FileSystemException
entries in the Heap dump data. These exceptions are being logged inside of the corresponding files {{name_of_project}}/builds/build_number/workflow/*.xml.
There is a side effect of these large amount of exception being logged and it is that these exceptions will not only consume disk space but also memory and time as the build is being loaded.
Environment
-
CloudBees CI (CloudBees Core) on modern cloud platforms - Managed controller
-
CloudBees CI (CloudBees Core) on modern cloud platforms - Operations Center
-
CloudBees CI (CloudBees Core) on traditional platforms - Client controller
-
CloudBees CI (CloudBees Core) on traditional platforms - Operations Center
-
CloudBees Jenkins Enterprise
-
CloudBees Jenkins Enterprise - Managed controller
-
CloudBees Jenkins Enterprise - Operations center
-
Jenkins LTS < 2.235.x
Resolution
This issue was traced back to Jenkins Core, more specifically to a change in PathRemover
behavior. This was traced in the Community Jira as JENKINS-61841
Calls to Util.deleteRecursive and other methods that end up calling PathRemover.forceRemoveRecursive on large directories can end up throwing instances of CompositeIOException with a very large number of nested exceptions, leading to excessive memory usage.
The fix for this potential problem was released as part of the 2.235.x line.