Issue
I have run the script listDProcessesNativeStacks.sh and shared the data with CloudBees Support. What is this and how does it help us understand what is causing the hanging of my controller?
Environment
-
Linux environment
-
CloudBees CI (CloudBees Core) on modern cloud platforms - Managed controller
-
CloudBees CI (CloudBees Core) on modern cloud platforms - Operations Center
-
CloudBees CI (CloudBees Core) on traditional platforms - Client controller
-
CloudBees CI (CloudBees Core) on traditional platforms - Operations Center
-
CloudBees Jenkins Enterprise
-
CloudBees Jenkins Enterprise - Managed controller
-
CloudBees Jenkins Enterprise - Operations center
Resolution
The script listDProcessesNativeStacks.sh uses a combination of ps
, awk
and cat
to identify processes in a D state
and dump their native stack.
It will usually be used in combination with the output of collectPerformanceData.sh to help us identify what exactly the native thread of a process is doing.
Where to retrieve the script
The script can be downloaded from this link.
What is a D state process
It is a process that is in an uninterruptible sleep. Usually this means that the process is waiting on I/O.
What does this script bring to collectPerformanceData
With the collectPerformanceData.sh script, we only have a java view of the stack. It means that we are missing what is happening at OS level. For instance, in the following stack we can only infer that the JVM is trying to write something, but we have no idea what is happening at a lower level:
"Executor #-1 for controller : executing myJob #11" Id=1305944 Group=main RUNNABLE (in native) at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method) at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89) at sun.nio.ch.IOUtil.write(IOUtil.java:51)
All we can say is that the JVM is waiting on a native I/O operation. Now, running the listDProcessesNativeStacks.sh in this context, we can extract more information:
jenkins+ 10656 4238440 2146620 ? D 00:00:00 [<ffffffff81168d1e>] sleep_on_page+0xe/0x20 [<ffffffff81168aa6>] wait_on_page_bit+0x86/0xb0 [<ffffffff81168be1>] filemap_fdatawait_range+0x111/0x1b0 [<ffffffff8116abff>] filemap_write_and_wait_range+0x3f/0x70 [<ffffffffa0422c7e>] nfs_file_fsync+0x7e/0x100 [nfs] [<ffffffff8120ff8b>] vfs_fsync+0x2b/0x40 [<ffffffffa0422f0a>] nfs_file_flush+0x7a/0xb0 [nfs] [<ffffffff811dc9f4>] filp_close+0x34/0x80 [<ffffffff811fd348>] __close_fd+0x78/0xa0 [<ffffffff811de103>] SyS_close+0x23/0x50 [<ffffffff81646d52>] tracesys+0xdd/0xe2 [<ffffffffffffffff>] 0xffffffffffffffff
Now, we can start investigating the NFS.
But how exactly can you use this?
The script is simple to use. It is designed to work on any linux system with ps
, awk
and cat
(even with the busybox
version of ps
).
You’ll need to run it with sudo, or with the root user. You don’t have any parameter to pass to it.
You can set up the output directory with the D_PROCESSES_OUTPUT_DIR
environment variable.
In case you run with sudo, make sure to pass the environment variable to the script by using the -E switch, e.g.:
export D_PROCESSES_OUTPUT_DIR=/tmp sudo -E ./listDProcessesNativeStacks.sh
NOTE: The script will most likely not work from within a container. But this shouldn’t be an issue as the D processes should be visible from the host using the root user.
Make sure to attach the output of the script to the Support Ticket.