Issue
-
Jenkins crashes or has an abnormal behavior (potentially when trying to delete an item) and a similar trace to the one shown below is seen on logs:
... java.nio.file.FileSystemException: /some/path/.nfs01234567890abcdef1234567: Device or resource busy ...
Environment
Product
-
CloudBees CI (CloudBees Core) on traditional platforms - Client controller
-
CloudBees CI (CloudBees Core) on traditional platforms - Operations Center
-
CloudBees CI (CloudBees Core) on modern cloud platforms - Managed controller
-
CloudBees CI (CloudBees Core) on modern cloud platforms - Operations Center
Extensions
-
HA plugin: NFS server
-
EnvInject Plugin <= 1.91.2
Resolution
Before anything else verify that your NFS server is correctly running and accessible from your Jenkins instance.
This error is known to appear if only the nfs
service is running and not the rpcbind
.
See our NFS Guide for more details to setup and optimize your NFS server configuration.
If you NFS server is properly running see these other options:
-
Upgrade your CloudbBees Jenkins Products beyond 2.55.x base line
-
Upgrade EnvInject Plugin version >= 1.91.3+. (Note: This step is optional in the case you have installed this plugin)
-
If EnvInject is not in use, there are a few things you can do to try to narrow down the culprit: run
lsof +D <FULL_PATH_TO_ITEM>
for the offending item. Verify that Jenkins is indeed the<PID>
that is returned and holding the .nfs files.-
If it is not Jenkins, and is another process, kill it by running
kill <PID>
. Verify that the .nfs are gone after the process is killed and attempt to delete the job again. -
If it is Jenkins, check to make sure that all plugins correctly close loggers and propagate the close to upper loggers. Refer as an example to to JENKINS-28409 for more information.
-
Proposed above commands work on Linux OS. |
Workaround
-
Inspect which files are locking your instance by using the following groovy code
println "find /var/jenkins_home/jobs -name .nfs* -not -name \"*.log\" -not -name \"*.xml\" -not -type d".execute().text
An output similar like this might appear
/var/jenkins_home/jobs/test-mxbuild-release-pipeline/builds/173/.nfs00000000a09e58ea00000dc8 /var/jenkins_home/jobs/test-mxbuild-release-pipeline/builds/170/.nfs00000000a0938d5200000dc9 /var/jenkins_home/jobs/test-mxbuild-release-pipeline/builds/174/.nfs00000000a06c784800000dca /var/jenkins_home/jobs/test-mxbuild-release-pipeline/builds/168/.nfs00000000a093780800000dcb ....
You can run that groovy script into a singular controller (Manage Jenkins > Script Console ) or in all your controller attached into your Operation Centetr including it into a Cluster Operation.
-
Once you have detected it, you need to get rid of them. There are two options available, that we list below:
-
Deleting them
println "find /var/jenkins_home/jobs -name .nfs* -not -name \"*.log\" -not -name \"*.xml\" -not -type d -delete".execute().text
-
Moving these lock files outside to a different location on the NFS
-