Issue
-
There is apparently a connectivity issue between two nodes (be it the operations center, a client controller/managed controller, an agent, or anything requiring connectivity). Looking at the support bundle and information provide can lead to think the issue could be coming from something in between those components.
Required Data Network Dump
This article describes how the collect a Network Dump which helps to troubleshoot connectivity issues between nodes in your CloudBees CI cluster.
If the required data is bigger than 50 MB, you will not be able to use ZenDesk to upload all the information. On this case, we would like to encourage you to use our upload service in order to attach all the required information.
Required Data check list
-
Network Dump
-
Time around the issue is exposed
-
A support bundle created when the issue is exposed
Network Dump
Network dumps need to be run in background mode from each of the nodes which are trying to connect. For example, if you have issues between the operations center, a client controller, and one agent, you want to run this 3 times for each node/host.
The following list of commands need to be run it before the issue occurs again.
Permanent Nodes
1/ Run the Network Dump
Indeed: this is going to set up rotating dump files and dump the network traffic there. So, it will take up to ~1 GB of disk on each node (100 x ~10 MB), and not more.
$ export CASEID=CHANGE_IT_TO_THE_ZENDESK_ISSUE_ID $ mkdir zd-$CASEID-tcpdump-$(hostname) $ cd zd-$CASEID-tcpdump-$(hostname) $ nohup sudo tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump.cap -W 100 -Z root &
2/ Collect the Network Dump
When the issue is exposed again, zip the directory you created above for each of the nodes.
$ zip -9 -r zd-$CASEID-tcpdump-$(hostname).zip zd-$CASEID-tcpdump-$(hostname)
Ephemeral Nodes (Containers)
Adding a sidecar container which contains tcpdump
. Build your own image (for example, my-custom-tcp-image
) like the following example:
FROM ubuntu RUN apt-get update && apt-get install -y tcpdump CMD tcpdump -i eth0
The following approach is not compatible with Websockets. |
Applications
The sidecar is defined in the Statefulset spec.templates.spec.containers
- name: tcpdump image: registry-example/my-custom-tcp-image command: ["/bin/sh", "-c", "tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump_controller.cap -W 100 -Z root"]
Managed controller can be added in the
. Whereas the operations center must be added manually into the Statefulset manifest.Agents
The sidecar is defined in the Agent Pod Template spec.containers
like the following example. Note that podRetention
is set to onFailure()
to be able to collect the tcpdump_agent.cap
file from the pod when the build fails due to the connectivity issues.
pipeline { agent { kubernetes { podRetention always() defaultContainer 'maven' yaml ''' apiVersion: v1 kind: Pod spec: containers: - name: maven image: maven:alpine command: - cat tty: true - name: tcpdump image: registry-example/my-custom-tcp-image command: ["/bin/sh","-c","tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump_agent.cap -W 100 -Z root"] ''' } } stages { stage('Run maven') { steps { sh 'mvn -version' sleep 3000 } } } }
Time around the issue is exposed
Make sure to tell us the exact time slot where you saw the issue occur again. This is required for us to be able to correlate the dates between network dumps, support bundles data and any information you may have provided us with
Support Bundle
A support bundle from the problematic instance ideally when the issue is happening, or in the worst case, right after the issue is exposed. Please, follow the KB below in case you don’t know how to generate a support bundle.
That may help us correlate network packets to potential error logs.