KBEC-00501 - Activating garbage collection logging on CD

Issue

Please tell me more about JVM Garbage Collection and how this might affect the use of CloudBees CD

Environment

CloudBees CD (CloudBees Flow)

Summary

The CD server runs under Java, and thus relies on a pre-dedicated Java Virtual Machine memory allocation. The amount of memory is defined in the wrapper.conf configuration file.

Memory allocated for any task, like a procedure, is intended to be actively cleaned up after the task completes. However, if the JVM memory is getting close to it’s full allocation limits, Java will automatically try to reclaim this memory using a Full Garbage Collection event. Such events should be avoided, as they will stop the service for a period of time. The larger the memory in play, the longer this freezing of the system will take to perform a Garbage Collection.

CloudBees CD systems are very dynamic and may run 10K’s of jobs at any time, but in order to handle such loads, the service needs to be configured with very large JVM heap allocations.

Due to these large heap memory sizes, if you are encountering a Garbage Collection (GC) event, the time to clear the GC can become a performance blocker that affects your users.

Customers who size their system in awareness of this risk can typically avoid observing a GC event, but it’s not always easy to know in advance what type of peak workloads that your users will be launching. It is also true that despite the best planning, usage patterns can always change over time.

Solution

If your system is observing times where users are seeing the UI "freeze" for a block of seconds, it may be that a GC event is taking place. If confirmed, the first mitigation recommendation is to increase memory for the JVM by: a) Increase the physical memory on the server machine b) In conjunction with (a) you may wish to adjust the memory percentage allocation stored in the wrapper.conf file for the server using this guideline:

Default is 40%, and this works well for physical memory <32GB
50% can work safely at 32GB
60% can work safely for >32GB
70% can work safely for >64GB c) Consider shifting to a Clustered solution for CloudBees CD to improve horizontal scalability and thus divide the memory demand across more machines.

For most customer environments, increasing the physical memory will take away this problem for some time, but using HA to divide the work should be considered. And when HA is considered for load and anticipated growth reasons, then moving to 3 nodes is recommended. re: As the usage grows across an HA system, a single node failure is more likely to produce a catostrophic system failure if only 2 nodes are present, hence moving to 3 or more nodes is best.

If memory adjustments are not showing any effect, or cannot be implemented immediately, then further digging may be warranted by activating GC logging and working with the CloudBees Support team to help confirm whether GC events are taking place.

See this KB article on how to confirm if GC events are happening using the MemoryGuard output found in the regular commander.log file generation: https://docs.cloudbees.com/d/kb-360032828592

Activating GC Logging

If GC events are happening and you are interested in collecting additional details for the CloudBees engineering team, the following instructions can help to generate GC detailed information:

Add the following to the end of DATA_DIRECTORY/conf/wrapper.conf to always keep and rotate up to hundred 100MB GC log files ( i.e totaling up to 100 X 100MB = 10 GB).

You can configure the folder where these GC log files will be created. In the below example ( if the CD server is installed on linux platforms) it assumes you have enough space in the “/gc_log_folder”.

You can always use a different folder by configuring the set.default.GC_LOG_DIRECTORY parameter value.

If the CD server is installed on Windows , you can go to the c:\temp folder by specifying the folder using a “/” separator ( for example set.default.GC_LOG_DIRECTORY=c:/temp ).

Also make sure none of the "wrapper.java.additional.nnnn" numbers already exist in the wrapper.conf as those numbers should be unique in that file.

Java 8:

set.default.GC_LOG_DIRECTORY=/gc_log_folder
wrapper.java.additional.1010=-verbose:gc
wrapper.java.additional.1012=-Xloggc:%GC_LOG_DIRECTORY%/gc.log
wrapper.java.additional.1013=-XX:NumberOfGCLogFiles=100
wrapper.java.additional.1014=-XX:+UseGCLogFileRotation
wrapper.java.additional.1015=-XX:GCLogFileSize=100m
wrapper.java.additional.1016=-XX:+PrintGC
wrapper.java.additional.1017=-XX:+PrintGCDateStamps
wrapper.java.additional.1018=-XX:+PrintGCDetails
wrapper.java.additional.1019=-XX:+PrintHeapAtGC
wrapper.java.additional.1020=-XX:+PrintGCCause
wrapper.java.additional.1021=-XX:+PrintTenuringDistribution
wrapper.java.additional.1022=-XX:+PrintReferenceGC
wrapper.java.additional.1023=-XX:+PrintAdaptiveSizePolicy

Java 11 and 17:

set.default.GC_LOG_DIRECTORY=/gc_log_folder
wrapper.java.additional.1010=-Xlog:gc*=info,gc+heap=debug,gc+ref*=debug,gc+ergo*=trace,gc+age*=trace:file=%GC_LOG_DIRECTORY%/gc.log:utctime,pid,level,tags:filecount=100,filesize=100M

This will create the gc log file similar to the following:

gc.log.0
gc.log.1
gc.log.2.current

You can also place a suffix on the gc log file names with %t and %p , where %t is the timestamp and %p is the process id.

Java 8:

For example wrapper.java.additional.1012=-Xloggc:%GC_LOG_DIRECTORY%/gc-%t_%p.log instead of wrapper.java.additional.1012=-Xloggc:%GC_LOG_DIRECTORY%/gc.log

Java 11 and 17:

For example wrapper.java.additional.1010=-Xlog:gc*=info,gc+heap=debug,gc+ref*=debug,gc+ergo*=trace,gc+age*=trace:file=%GC_LOG_DIRECTORY%/gc-%t.log:utctime,pid,level,tags:filecount=100,filesize=100M instead of wrapper.java.additional.1010=-Xlog:gc*=info,gc+heap=debug,gc+ref*=debug,gc+ergo*=trace,gc+age*=trace:file=%GC_LOG_DIRECTORY%/gc.log:utctime,pid,level,tags:filecount=100,filesize=100M

This will create the gc log file similar to the following:

...
gc-2020-12-12_13-04-37_pid40004.log.9
gc-2020-12-12_13-04-37_pid40004.log.10.current
...

Log Collection Sizing In terms of the volume of GC logs collected, you could adjust these 2 lines to store less log files or less content per log. This example retains 1/5 the number of logs with half the content per file than the example shown above:

eg:

Java 8:

wrapper.java.additional.1013=-XX:NumberOfGCLogFiles=20
wrapper.java.additional.1015=-XX:GCLogFileSize=50m

Java 11 and 17:

wrapper.java.additional.1010=-Xlog:gc*=info,gc+heap=debug,gc+ref*=debug,gc+ergo*=trace,gc+age*=trace:file=%GC_LOG_DIRECTORY%/gc.log:utctime,pid,level,tags:filecount=20,filesize=50M

GC Logging Activation Consideration

A good best practice would be to activate such logging for a 1 week period to collect a decent coverge on average usage for what data is being collected along the way.