Windows JNLP Agents try to reconnect periodically

Issue

The Jenkins logs show lots of message like the following, about every 10 seconds:

2017-08-29 21:09:27.724+0000 [id=...]	INFO   	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted connection #<connectionId> from <remoteSocketAddress>

The agent logs show lots of warning like the following, about every 10 seconds:

Aug 29, 2017 9:09:27 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Server didn't accept the handshake: <agentName> is already connected to this controller. Rejecting this connection.

Environment

CloudBees CI (CloudBees Core) on modern cloud platforms - Managed controller
CloudBees CI (CloudBees Core) on modern cloud platforms - Operations Center
CloudBees CI (CloudBees Core) on traditional platforms - Client controller
CloudBees CI (CloudBees Core) on traditional platforms - Operations Center
CloudBees Jenkins Enterprise
CloudBees Jenkins Enterprise - Managed controller
CloudBees Jenkins Enterprise - Operations center
Jenkins LTS
WinSW

Resolution

When there is a runaway process of a Windows JNLP agent or that a Windows JNLP agent is outdated, the agent process may try to reconnect every 10 seconds although it may already be connected. In such situation, the controller logs are spammed with the connection logs like the following:

2017-08-29 21:09:27.724+0000 [id=...]	INFO   	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted connection #<connectionId> from <remoteSocketAddress>
2017-08-29 21:09:27.738-0000 [id=...]	WARNING	j.slaves.JnlpSlaveHandshake#error: TCP slave agent connection handler #<connectionId> with <remoteSocketAddress> is aborted: <agentName> is already connected to this controller. Rejecting this connection.

Or since Jenkins 2.60.1 LTS:

2017-08-29 21:09:27.724+0000 [id=...]	INFO    h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted <protocol> connection #<connectionId> from <remoteSocketAddress>
2017-08-29 21:09:27.738-0000 [id=...]	WARNING	j.slaves.JnlpSlaveHandshake#error: TCP slave agent connection handler #<connectionId> with <remoteSocketAddress> is aborted: <agentName> is already connected to this controller. Rejecting this connection.

If the connection fails instead of being rejected, you may see the following:

2017-08-29 21:09:27.724+0000 [id=...]	INFO   	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted connection #<connectionId> from <remoteSocketAddress>
2017-08-29 21:09:27.724+0000 [id=...]	WARNING	h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #<connectionId> failed
java.io.EOFException

These messages display useful information - like the agentName or the remoteSocketAddress - to track down which agents are at fault.

Fix the Agents

1. Ensure that the agent(s) impacted do not have a runaway process for jenkins-agent.exe

Lookout for jenkins-agent.exe processes. Killing the runaway process(es) or restarting the host should resolve this.

2. Ensure that the agent(s) impacted are up to date

You may encounter the issue after a Jenkins upgrade as Windows JNLP agents (and the service wrapper) are not upgraded automatically. Upgrading Windows agents and/or the Windows Service Wrapper should resolve this.

Since Jenkins 2.60.1 LTS, please use Windows Installer Jenkins Module as a reference.

3. Configure the Windows Agent to prevent too frequent re-connections

In the agent configuration file jenkins-agent.xml, add the option -noReconnect to the startup command to prevent the agent to reconnect automatically and add an onfailure entry to control the restart of the service with a specific delay. For example:

<service>
  <id>agent</id>
  <name>JNLP Agent</name>
  <description>This service runs a agent for Jenkins continuous integration system.</description>
  <executable>C:\Program Files\Java\jre1.8.0_141\bin\java.exe</executable>
  <arguments>-Xrs  -jar "%BASE%\agent.jar" -noReconnect -jnlpUrl https://cjpcm.example.com/computer/windowsAgentJNLP/jenkins-agent.jnlp -secret XXXXXXXXXXXXXXXXXXXXXX</arguments>
  <logmode>rotate</logmode>
  <onfailure action="restart" delay="120 sec"/>
</service>

Since Jenkins 2.60.1 LTS, the Windows Service Wrapper offers extension to kill runaway processes and automatically upgrade agents. See Upgrading Windows controllers and agents for 2.60.1

Windows JNLP Agents try to reconnect periodically

Issue

Environment

Resolution

Fix the Agents

References