Symptoms
-
The Jenkins
java
process is being oom-killed -
top
analysis shows unusual memory consumption outside of heap memory and metaspace memory
Diagnosis/Treatment
-
Pre-conditions:
-
You have already confirmed expected heap utilization is normal through garbage collection log analysis.
-
You have already confirmed expected metaspace memory consumption.
-
You are running a supported JDK which includes
jcmd
-
You are engaged with CloudBees Support via an existing support case
Summary
This article describes how to enable Java native memory tracking to trace memory issues that may lie outside of the JVM Heap and Metaspace, as illustrated in the following picture:
Configure NMT Tracking
Step 1
First, you will need to add the JVM argument: -XX:NativeMemoryTracking=detail
by following the instructions outlined here: How to add Java Arguments to Jenkins to get a more detailed view of native memory usage by tracking exactly what methods allocate the most memory. Enabling NMT will result in 5-10 percent JVM performance.
In order for the new argument to take effect, the Jenkins java
process must be restarted. It is recommended to do this during a scheduled maintenance window.
The Oracle documentation on native memory troubleshooting can be found here: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html
Step 2
To gather a summary, the following jcmd
command can be run:
jcmd <pid> VM.native_memory summary
This will output a summary log similar to the following:
Native Memory Tracking: Total: reserved=9769936KB +130080KB, committed=8644184KB +212676KB - Java Heap (reserved=7340032KB, committed=7340032KB) (mmap: reserved=7340032KB, committed=7340032KB) - Class (reserved=1279073KB +47364KB, committed=262493KB +52484KB) (classes #36038 +5114) (malloc=9313KB +2308KB #183403 +54470) (mmap: reserved=1269760KB +45056KB, committed=253180KB +50176KB) - Thread (reserved=267357KB -41298KB, committed=267357KB -41298KB) (thread #267 -40) (stack: reserved=266176KB -41120KB, committed=266176KB -41120KB) (malloc=870KB -132KB #1338 -200) (arena=312KB -47 #529 -80) - Code (reserved=278652KB +13258KB, committed=183820KB +90738KB) (malloc=29052KB +13258KB #30409 +10291) (mmap: reserved=249600KB, committed=154768KB +77480KB) - GC (reserved=371875KB +18969KB, committed=371875KB +18969KB) (malloc=66723KB +18969KB #364357 +127871) (mmap: reserved=305152KB, committed=305152KB) - Compiler (reserved=768KB -205KB, committed=768KB -205KB) (malloc=637KB -205KB #3163 +746) (arena=131KB #6) - Internal (reserved=164392KB +103183KB, committed=164388KB +103179KB) (malloc=164356KB +103179KB #69335 +8065) (mmap: reserved=36KB +4KB, committed=32KB) - Symbol (reserved=36296KB +3974KB, committed=36296KB +3974KB) (malloc=31877KB +3623KB #359749 +42849) (arena=4419KB +352 #1) - Native Memory Tracking (reserved=15871KB +3803KB, committed=15871KB +3803KB) (malloc=43KB #500 +5) (tracking overhead=15828KB +3804KB) - Arena Chunk (reserved=1284KB -18968KB, committed=1284KB -18968KB) (malloc=1284KB -18968KB) - Unknown (reserved=14336KB, committed=0KB) (mmap: reserved=14336KB, committed=0KB)
Note that there are 11 different areas of memory consumption from the JVM. Most notably, "Java Heap" is the amount of heap space allocated to the JVM. Also, "Class" can be traced to metaspace, as this is where class metadata is stored. The other 9 areas should hover around 10-250MB, respectively. When we see areas of native memory above 1GB, it is considered abnormal.
Create NMT summaries
Now that you have NMT Summarization enabled, running the aforementioned jcmd
command hourly should allow you to review any abnormalities of native memory. This can be achieved programmatically, either by running a simple script external to Jenkins, or by creating a Jenkins job from within the application. We will provide both options below. Please keep in mind that collecting NMT data has roughly a 10% resource overhead cost to the JVM.
1.) Collecting the data external to Jenkins via Bash script
The following example will collect the data hourly when called from a cron
trigger on the host OS.
#!/bin/bash TSTAMP="$(date +'%Y%m%d_%H%M%S')" jenkinsPid="$(pgrep -o java)" nmtLog="$JENKINS_HOME/support/nmt.log" echo $TSTAMP $JENKINS_CLUSTER_ID >> $nmtLog jcmd $jenkinsPid VM.native_memory summary >> $nmtLog
Save this bash script as a file named nmtlogging.sh in $JENKINS_HOME/support
Add the script to your crontab
by running the following command:
crontab -e
Append the following entry:
0 * * * * JENKINS_HOME='/path/to/jenkins-home' $JENKINS_HOME/support/nmtlogging.sh
Save and close the file.
2.) Collecting the data via a Jenkins job
Creating a new Freestyle job with a 'Build periodically' trigger on the Jenkins controller where you have enabled NMT Summary tracking will allow you to run the script mentioned above on an hourly basis as highlighted in the following screenshots:
Analyze NMT summaries
Once you have 24 hours of summary data appended to the $nmtLog
, please upload them to your existing support ticket within CloudBees Support for review. If your archive is larger than 20Mb please use this service to send it to us. This service works best in Chrome or Firefox.