Required Data: CloudBees CI hang or high CPU on Kubernetes or Linux

Article ID:229795948
3 minute readKnowledge base

Issue

A specific job or your CloudBees CI environment hangs or responds slowly, or is experiencing high CPU usage.

If the required data is larger than 50 MB you will not be able to use Zendesk to upload all the information. On this case we would like to encourage you to use our upload service in order to attach all the required information.

Required Data check list

Ideally, your environment is configured as documented in Prepare CloudBees CI for Support.

  • Support bundle

  • Output of the script collectPerformanceData.sh

  • GC log file for review

  • Optional (if an I/O issue is suspected): output of the script listDProcessesNativeStacks.sh

Support bundle

Please capture a support bundle from the problem instance, ideally when the issue is happening, or else right after the issue is exposed. Generating a support bundle describes how to create a support bundle.

Please, enable at minimum the following checks: System properties, Controller Log Recorders, Garbage Collection Logs, Slow Request Records, Out Of Memory Errors, Controller Heap Histogram, Deadlock Records, Thread dumps of running Pipeline builds and Thread dumps.

Output of the script collectPerformanceData.sh

You can download this script from us here.

If the collectPerformanceData.sh script does not run as expected please upload into the support ticket the output the script generates from the terminal window for review.

The output from this script can be understood by reviewing What is collectPerformanceData.sh and how does it help?.

Modern platforms

  1. Collect cluster information (example for namespace ci):

    kubectl get pod,svc,endpoints,statefulset,deployment,ingress,pvc,pv,sa,role,rolebinding -n ci -o yaml > cluster.yml kubectl get pod,svc,endpoints,statefulset,deployment,ingress,pvc,pv,sa,role,rolebinding -n ci -o wide > cluster.log kubectl get events --sort-by=.metadata.creationTimestamp -n ci > events.log kubectl describe pods -n ci > pods-describe.log kubectl top pod -n ci > top-pod.log kubectl top node > top-node.log
  2. Exec into the pod where the issue is occurring and run collectPerformanceData.sh while the issue is occurring (example for pod controller-0 in namespace ci):

    kubectl exec -it controller-0 -n ci -- sh cd /tmp/ curl https://s3.amazonaws.com/cloudbees-jenkins-scripts/e206a5-linux/collectPerformanceData.sh -o collectPerformanceData.sh # optional, but strongly recommended: download busybox so 'top' and 'top -H' can be collected curl https://busybox.net/downloads/binaries/1.35.0-x86_64-linux-musl/busybox -o busybox export BUSYBOX_HOME=/tmp chmod +x collectPerformanceData.sh chmod +x busybox jps # Replace PID with the output pid from 'jps' ./collectPerformanceData.sh PID
  3. The output file name will be performanceData.PID.output.tar.gz. Attach this file to the support ticket.

    kubectl cp -n ci controller-0:/tmp/performanceData.PID.output.tar.gz ./performanceData.PID.output.tar.gz

Traditional platforms

  1. See Supported platforms for CloudBees CI on traditional platforms and ensure that you are using a supported Java environment.

  2. Check you have the required tools to run the script included in the $PATH. The collectPerformanceData.sh script collects thread dumps using the jstack command. It also needs to be able to run top, vmstat, netstat, nfsiostat, nfsstat and iostat. Please make sure that the user the controller is running as can execute all of these commands.

  3. Determine the $USER and $PID

    ps -ef | grep java # or jps

    You will see output similar to:

    jenkins 12345 17347 0 Mar17 ? 00:00:17 /usr/bin/java -jar jenkins.war

    The first two columns show the user and process ID. In this case, $USER is jenkins and $PID is 12345. It is best to use ps to determine these values, rather than looking at the service PID file, because on some systems the PID file contains the process ID of the daemon/service process, rather than the JVM itself.

  4. Execute collectPerformanceData.sh while the issue is occurring:

    cd /tmp/ curl https://s3.amazonaws.com/cloudbees-jenkins-scripts/e206a5-linux/collectPerformanceData.sh -o collectPerformanceData.sh chmod +x collectPerformanceData.sh jps # Replace PID with the output pid from 'jps', and USER with the user running the controller sudo -u USER ./collectPerformanceData.sh PID
  5. The output file name will be performanceData.PID.output.tar.gz. Attach this file to the support ticket.

GC log file for review

If you followed Prepare CloudBees CI for Support then the gc logs file should be under -Xloggc:$path/gc.log

Output of the script listDProcessesNativeStacks.sh