Issue
-
Jenkins suddenly crashed and
dmesg
shows:
[XXXXX] Out of memory: Kill process <JENKINS_PID> (java) score <SCORE> or sacrifice child [XXXXX] Killed process <JENKINS_PID> (java) total-vm:XXXkB, anon-rss:XXXkB, file-rss:XXXkB, shmem-rss:XXXkB
Background
A Java process is made up of:
-
Java heap space (set via
-Xms
and-Xmx
) -
the Metaspace
-
the Native Memory area
Each one of these areas will use RAM. The memory footprint of Jenkins (a Java application) is the sum of the maximum Java heap size, the Metaspace size and the native memory area. By default the Metaspace and Native memory areas can grow to unlimited size, and they do not normally require tuning. Typical usage of each is only a few hundred MB.
It is important to understand that the Operating System itself and any other processes running on the machine have their own requirements regarding RAM and CPU. The Operating System uses a certain amount of RAM which leaves the remaining RAM to be split among Jenkins and any other processes on the machine.
Resolution
(This does not indicate a problem with Jenkins. It indicates that the Operating System is unable to provide enough resources for all the programs it has been asked to run.)
The Out Of Memory (OOM) Killer is a function of the Linux kernel that kills user processes when free RAM is very low, in order to prevent the whole system from going down due to the lack of memory. The function applies some heuristics (it gives each process a score) to decide which process to kill when the system is in such state. The process monopolizing the most memory and not releasing enough of it is more likely to be killed. In a system where Jenkins is the primary user, it tends to be the process using the most RAM, and therefore is most likely to be killed when system memory runs low.
If you are affected by this error, there could be different causes:
-
Too much memory is allocated to Jenkins, and therefore there is not enough free for other housekeeping processes
-
Other processes are running on the same machine as Jenkins and using too much memory
Following are recommendations for each case.
1) Too much memory allocated to Jenkins
You must not allocate all or nearly all of the system memory to the JVM where Jenkins is running. That is because the Operating System needs free memory for other housekeeping processes in addition to Jenkins.
We recommend keeping a minimum of 2-4GB of memory free for non-Jenkins processes. For example, if you are running on a system with 16GB of RAM, you should not allocate more than 12GB of heap for Jenkins. This is especially important in containerized environments, where it can be tempting to allocate small amounts of RAM to each container. When there is less than a 2GB difference between the maximum heap that Jenkins has been configured with and the total RAM available to the container, it is likely that Jenkins will eventually be killed by the kernel.
2) Other processes are impacting Jenkins
In this scenario, Jenkins is not the only process running on the machine but it is killed because it is the process consuming the most memory on the OS.
We strongly recommend that Jenkins be the primary service/process running on the machine hosting it. Should you run other processes, like for example monitoring agents, ensure that they are not overloading the system or otherwise that enough resources are available to handle the load on the machine.
How to find the culprit
It is possible to check the processes consuming the most memory at any time on the machine with commands like:
$ top -o mem
or:
$ ps aux --sort -pmem
You can also view the kernel logs by running the command dmesg
. In these logs, locate the Out of memory: Kill process <JENKINS_PID>
message. Just above that message, the kernel dumps the stats of the processes that were running. For example:
[...] [XXXXX] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [XXXXX] [ 480] 0 480 13863 113 26 0 -1000 auditd [XXXXX] [12345] 123 12345 4704977 3306330 6732 0 0 java [XXXXX] [11939] 0 11939 46699 328 48 0 0 crond [XXXXX] [11942] 0 11942 28282 45 12 0 0 sh [XXXXX] [16789] 456 16789 1695936 38643 165 0 0 java [...] [XXXXX] Out of memory: Kill process 12345 (java) score 869 or sacrifice child [XXXXX] Killed process 12345 (java) total-vm:18819908kB, anon-rss:13225320kB, file-rss:0kB, shmem-rss:0kB [...]
In this example, the Jenkins PID was 12345
and it was killed. We can see in the summary (Killed process
line) that Jenkins was using ~13 GiB of memory (see anon-rss
- the total-vm
value can be disregarded). However, in the table there is also another process with PID 16789
that is reserving ~6.4 GiB of memory (note that table memory values are in 4 KiB pages, so you must multiply the rss
value by 4 to determine actual RAM usage in KiB). You can then investigate more about this other process and see what it does by running the following command:
$ ps -f <pid>
It is possible that this process is leaking memory or perhaps just should not be running on the same system as Jenkins.