Swarm disconnection due to already connected agent

1 minute read

Issue

  • Swarm agents disconnects

  • Swarm agent already connected

Environment

  • Jenkins Enterprise

  • Swarm plugin < 1.22

Resolution

The swarm’s PluginImpl.doCreateSlave method will assign a node an alternative name if there is a name conflict, but it never tells the swarm client about the name change.

This means that the swarm client tries to connect using the name that it thinks it has, and then that connection gets rejected as there is already an agent connected with that name.

If you encounter swarm agent disconnects one place to look is the name of the swarm agent after generating a support bundle. In nodes/slaves folder you will see:

  • AGENT1-10.0.0.1 (hudson.plugins.swarm.SwarmSlave)

  • AGENT1-10.0.0.2 (hudson.plugins.swarm.SwarmSlave)

  • AGENT1-10.0.0.3 (hudson.plugins.swarm.SwarmSlave)

while,

  • AGENT1 (hudson.plugins.swarm.SwarmSlave)

is already connected and is the one with the actual connection. This can confused the Jenkins instance because the new agent name is not properly being returned to the swarm client.

In order to prevent this from happening the agent machines need to have a unique name every time the agent machine is created, otherwise the swarm client will not be able to properly determine the difference in agent names.