What is listDProcessesNativeStacks.sh and how does it help?

Article ID:360018917251
2 minute readKnowledge base

Issue

I have run the script listDProcessesNativeStacks.sh and shared the data with CloudBees Support. What is this and how does it help us understand what is causing the hanging of my controller?

Resolution

The script listDProcessesNativeStacks.sh uses a combination of ps, awk and cat to identify processes in a D state and dump their native stack.

It will usually be used in combination with the output of collectPerformanceData.sh to help us identify what exactly the native thread of a process is doing.

Where to retrieve the script

The script can be downloaded from this link.

What is a D state process

It is a process that is in an uninterruptible sleep. Usually this means that the process is waiting on I/O.

What does this script bring to collectPerformanceData

With the collectPerformanceData.sh script, we only have a java view of the stack. It means that we are missing what is happening at OS level. For instance, in the following stack we can only infer that the JVM is trying to write something, but we have no idea what is happening at a lower level:

"Executor #-1 for controller : executing myJob #11" Id=1305944 Group=main RUNNABLE (in native)
    at sun.nio.ch.FileDispatcherImpl.pwrite0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.pwrite(FileDispatcherImpl.java:66)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
    at sun.nio.ch.IOUtil.write(IOUtil.java:51)

All we can say is that the JVM is waiting on a native I/O operation. Now, running the listDProcessesNativeStacks.sh in this context, we can extract more information:

jenkins+ 10656 4238440 2146620 ?     D    00:00:00
[<ffffffff81168d1e>] sleep_on_page+0xe/0x20
[<ffffffff81168aa6>] wait_on_page_bit+0x86/0xb0
[<ffffffff81168be1>] filemap_fdatawait_range+0x111/0x1b0
[<ffffffff8116abff>] filemap_write_and_wait_range+0x3f/0x70
[<ffffffffa0422c7e>] nfs_file_fsync+0x7e/0x100 [nfs]
[<ffffffff8120ff8b>] vfs_fsync+0x2b/0x40
[<ffffffffa0422f0a>] nfs_file_flush+0x7a/0xb0 [nfs]
[<ffffffff811dc9f4>] filp_close+0x34/0x80
[<ffffffff811fd348>] __close_fd+0x78/0xa0
[<ffffffff811de103>] SyS_close+0x23/0x50
[<ffffffff81646d52>] tracesys+0xdd/0xe2
[<ffffffffffffffff>] 0xffffffffffffffff

Now, we can start investigating the NFS.

But how exactly can you use this?

The script is simple to use. It is designed to work on any linux system with ps, awk and cat (even with the busybox version of ps). You’ll need to run it with sudo, or with the root user. You don’t have any parameter to pass to it. You can set up the output directory with the D_PROCESSES_OUTPUT_DIR environment variable. In case you run with sudo, make sure to pass the environment variable to the script by using the -E switch, e.g.:

export D_PROCESSES_OUTPUT_DIR=/tmp
sudo -E ./listDProcessesNativeStacks.sh

NOTE: The script will most likely not work from within a container. But this shouldn’t be an issue as the D processes should be visible from the host using the root user.

Make sure to attach the output of the script to the Support Ticket.