Symptoms
-
I am not able to connect a JNLP agent to a Jenkins Instance
-
The build has failed because the connection got broken
-
The build is stalled in the queue waiting for the agent
-
The agent is disconnected and cannot connect again
-
Channel is broken warning at logs
-
Any of the exceptions listed below
Diagnosis/Treatment
There are some required data before starting to diagnose the issue that needs to be provided:
-
Last successful (agent was connected) and failed build (agent was disconnected) folders as explained Required Data: An issue with a Build of a Job
1. Requirements
1.A Ensure that the Java version is at least on the same line on both controller and agent
A good practice is to run the exactly same Java version in both Jenkins and agent, but when this is not possible it is mandatory to be running at least the same base line (major version coordinate). Check Supported JDK for CloudBees Core.
Run java -version
in both Jenkins controller box and agent to check the java version you are running in both.
1.B Ensure that the version of agent.jar matches with the one
The main problem of running JNLP as an agent Launcher is that when you upgrade Jenkins agent.jar
is not automatically upgraded on the agent it happens in SSH Launcher out of the box. It can be solved in Windows by using JNLP + winsw adding the Remoting executable in <download from="${JENKINS_URL}/jnlpJars/agent.jar" to="%BASE%\agent.jar"/>
.
Check that agent.jar
is the same using for example md5sum agent.jar
. agent.jar
can be downloaded from Jenkins controller from the URL below:
https://<JENKINS_URL>/jnlpJars/agent.jar
Please refer to Remoting Best Practices — Agent Daemonization
Partial solutions:
-
Using the Versions Node Monitors Plugin
-
Share
agent.jar
via NFS
1.C Connectivities checks
Use jenkins-cli to check the connection
In the agent box, download the CLI and run a help command in your favorite mode. For example, using http
mode:
java -jar jenkins-cli.jar [-s $JENKINS_URL] -auth <user>:<token> help
Check that the agent is able to see the JENKINS headers
# curl -IvL <JENKINS_URL> curl -IvL https://jenkins:8443
The curl
command can be installed from your OS package manager on Linux, or on Windows you can download it from https://curl.se/.
=====Check that the inbound TCP port of the controller is accessible from the agent
nc -z CONTROLLER_HOSTNAME TCP_PORT # or using telnet # telnet CONTROLLER_HOSTNAME TCP_PORT
2. Use a different Launch mechanism
For Jenkins >= 2.204.1 LTS, switch to a different Launch mechanism: Connect directly to TCP port.
3. Known issues
3.A. Unable to load class once the loading was interrupted
JENKINS-36991 Unable to load class once the loading was interrupted is resolved and Released in remoting 2.61.
To confirm what remoting version your agent.jar
(formerly slave.jar
) file is currently tied to, run the following command in the same directory as your .jar
file and check the parameter REMOTING_VERSION
in output:
jar xf agent.jar META-INF/MANIFEST.MF more META-INF/MANIFEST.MF
Jenkins log / Build console output log
java.lang.NoClassDefFoundError: Could not initialize class jenkins.model.Jenkins at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at Script1.class$(Script1.groovy) at Script1.$get$$class$jenkins$model$Jenkins(Script1.groovy) at Script1.run(Script1.groovy:1) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:580) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:618) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:121) at hudson.remoting.UserRequest.perform(UserRequest.java:49) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
Agent log
Slave.jar version: 2.52 This is a Unix slave Evacuated stdout Slave successfully connected and online Jul 27, 2016 8:36:57 AM jenkins.model.Jenkins <clinit> SEVERE: Failed to load Jenkins.class hudson.remoting.RemotingSystemException: java.lang.InterruptedException at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:266) at com.sun.proxy.$Proxy5.fetch3(Unknown Source) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at com.thoughtworks.xstream.XStream.buildMapper(XStream.java:590) at com.thoughtworks.xstream.XStream.<init>(XStream.java:568) at com.thoughtworks.xstream.XStream.<init>(XStream.java:496) at com.thoughtworks.xstream.XStream.<init>(XStream.java:465) at com.thoughtworks.xstream.XStream.<init>(XStream.java:411) at com.thoughtworks.xstream.XStream.<init>(XStream.java:350) at hudson.util.XStream2.<init>(XStream2.java:88) at jenkins.model.Jenkins.<clinit>(Jenkins.java:4217) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at Script1.class$(Script1.groovy) at Script1.$get$$class$jenkins$model$Jenkins(Script1.groovy) at Script1.run(Script1.groovy:1) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:580) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:618) at groovy.lang.GroovyShell.evaluate(GroovyShell.java:589) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:142) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:121) at hudson.remoting.UserRequest.perform(UserRequest.java:49) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:147) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:253) ... 30 more
3.B. Intermittent Invalid Object ID in remoting module
JENKINS-23271 Intermittent Invalid Object ID in remoting module
It’s fixed and released on Jenkins core higher than 2.32
Happens frequently on Java 8 due its object management logic. Causes issues in task execution (build failures, agent disconnects)
3.C. Ping Thread
Check the Ping Thread Documentation here.
PingThread checks that agent is ABLE to execute a command from controller (NOOP request)
Ping command may fail to execute:
-
Overloaded queue, all agent workers are busy → On big boxes you can increase the number of remoting TaskPool workers
-
Network overloaded
In some cases disabling can help
So, if this is the stacktrace you are seeing all the time, you should then disable the PingThread. The side effect is just that the agent is suppose to hung in case the communication is failing between controller and agents. The good side is that you will be able to get a thread dump on both sides controller and agent.
Jenkins log / Build console output log
Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:1163) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118) at hudson.remoting.PingThread.ping(PingThread.java:126) at hudson.remoting.PingThread.run(PingThread.java:85) Caused by: java.util.concurrent.TimeoutException: Ping started at 1474633728617 hasn't completed by 1474633968617 ... 2 more
3.D. JNLP Cloud Agents are disconnected on start process
It affects Jenkins core higher than 2.28
Relax requirements of the JNLP connection receiver, which was rejections connections from agents not using JNLPComputerLauncher (e.g. from Agent Setup, vSphere Cloud and other plugins). No the connection is accepted from launchers implementing other proxying and filtering Launcher implementations. Particular plugins may require setting up the -Djenkins.slaves.DefaultJnlpSlaveReceiver.disableStrictVerification=true
system property in the controller JVM to allow connecting agents. JENKINS-39232, regression in 2.28
4. HA / LB / Reverse proxy bypass
-
It’s highly recommended adding
-Dhudson.TcpSlaveAgentListener.hostName=$MASTER_IP
Java properties on controller. In such a case, the connection goes directly to instance w/o passing through HAproxy/Load balancer/Reverse proxy. See JNLP connectivity Best Practices
5. Clear the Java Web Start Cache
If, when starting the JNLP file, you see an error like the one below, run the command javaws -clearcache
to clear the cache of the java webstart program.
java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at sun.security.ssl.InputRecord.readFully(Unknown Source) at sun.security.ssl.InputRecord.read(Unknown Source) at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.access$200(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessController.doPrivilegedWithCombiner(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source) at com.sun.deploy.net.HttpUtils.followRedirects(Unknown Source) at com.sun.deploy.net.BasicHttpRequest.doRequest(Unknown Source) at com.sun.deploy.net.BasicHttpRequest.doGetRequestEX(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.checkUpdateAvailable(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.isUpdateAvailable(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source) at com.sun.javaws.LaunchDownload$DownloadTask.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
7. TCP retransmission timeout OSS - perhaps increase
7.A Linux
sysctl -w net.ipv4.tcp_keepalive_time=120 sysctl -w net.ipv4.tcp_keepalive_intvl=30 sysctl -w net.ipv4.tcp_keepalive_probes=8 sysctl -w net.ipv4.tcp_fin_timeout=30
7.B Windows
KeepAliveInterval = 30000 KeepAliveTime = 120000 TcpMaxDataRetransmissions = 8 TcpTimedWaitDelay=30
8. When all fails
-
Try to add this Java property on controller
-Djenkins.slaves.NioChannelSelector.disabled=true
-
Still I/O available and it complicates and improve the performance
-
Try to add this Java property on controller
-Djenkins.slaves.JnlpSlaveAgentProtocol3.enabled=false
9. When no secret is included in connection string
When the agent launch command is missing -secret
or you experience below stacktrace during agent connection, it is normally a result of permission set for system user anonymous
.
Failing to obtain $JENKINS_URL/computer/$AGENT_NAME/jenkins-agent.jnlp java.io.IOException: Failed to load $JENKINS_URL/computer/$AGENT_NAME/jenkins-agent.jnlp: 403 Forbidden at hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:521) at hudson.remoting.Launcher.run(Launcher.java:347) at hudson.remoting.Launcher.main(Launcher.java:298) Waiting 10 seconds before retry
Removing all agent permissions under Manage Jenkins
> Manage Roles
for the system user anonymous
should resolve the issue.