Required Data: Network Dump

Article ID:229040647
3 minute readKnowledge base

Issue

  • There is apparently a connectivity issue between two nodes (be it the operations center, a client controller/managed controller, an agent, or anything requiring connectivity). Looking at the support bundle and information provide can lead to think the issue could be coming from something in between those components.

Prerequisite

Machines must be synchronized in time for us to be able to analyze them.

Required Data Network Dump

This article describes how the collect a Network Dump which helps to troubleshoot connectivity issues between nodes in your CloudBees CI cluster.

If the required data is bigger than 50 MB, you will not be able to use ZenDesk to upload all the information. On this case, we would like to encourage you to use our upload service in order to attach all the required information.

Environment

Required Data check list

  • Network Dump

  • Time around the issue is exposed

  • A support bundle created when the issue is exposed

Network Dump

Network dumps need to be run in background mode from each of the nodes which are trying to connect. For example, if you have issues between the operations center, a client controller, and one agent, you want to run this 3 times for each node/host.

The following list of commands need to be run it before the issue occurs again.

Permanent Nodes

1/ Run the Network Dump

Indeed: this is going to set up rotating dump files and dump the network traffic there. So, it will take up to ~1 GB of disk on each node (100 x ~10 MB), and not more.

$ export CASEID=CHANGE_IT_TO_THE_ZENDESK_ISSUE_ID $ mkdir zd-$CASEID-tcpdump-$(hostname) $ cd zd-$CASEID-tcpdump-$(hostname) $ nohup sudo tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump.cap -W 100 -Z root &

2/ Collect the Network Dump

When the issue is exposed again, zip the directory you created above for each of the nodes.

$ zip -9 -r zd-$CASEID-tcpdump-$(hostname).zip zd-$CASEID-tcpdump-$(hostname)

Ephemeral Nodes (Containers)

Adding a sidecar container which contains tcpdump. Build your own image (for example, my-custom-tcp-image) like the following example:

FROM ubuntu RUN apt-get update && apt-get install -y tcpdump CMD tcpdump -i eth0
The following approach is not compatible with Websockets.
Applications

The sidecar is defined in the Statefulset spec.templates.spec.containers

- name: tcpdump image: registry-example/my-custom-tcp-image command: ["/bin/sh", "-c", "tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump_controller.cap -W 100 -Z root"]

Managed controller can be added in the controller Provisioned page  Advanced configuration  YAML. Whereas the operations center must be added manually into the Statefulset manifest.

Agents

The sidecar is defined in the Agent Pod Template spec.containers like the following example. Note that podRetention is set to onFailure() to be able to collect the tcpdump_agent.cap file from the pod when the build fails due to the connectivity issues.

pipeline { agent { kubernetes { podRetention always() defaultContainer 'maven' yaml ''' apiVersion: v1 kind: Pod spec: containers: - name: maven image: maven:alpine command: - cat tty: true - name: tcpdump image: registry-example/my-custom-tcp-image command: ["/bin/sh","-c","tcpdump -i eth0 -s 1522 -C 10000000 -w tcpdump_agent.cap -W 100 -Z root"] ''' } } stages { stage('Run maven') { steps { sh 'mvn -version' sleep 3000 } } } }

Time around the issue is exposed

Make sure to tell us the exact time slot where you saw the issue occur again. This is required for us to be able to correlate the dates between network dumps, support bundles data and any information you may have provided us with

Support Bundle

A support bundle from the problematic instance ideally when the issue is happening, or in the worst case, right after the issue is exposed. Please, follow the KB below in case you don’t know how to generate a support bundle.

That may help us correlate network packets to potential error logs.