Issue
Your inbound (formerly known as "JNLP") build agent is failing to connect to your Jenkins controller.
Resolution
Since CloudBees Core 2.222.1.1 it is possible to connect your inbound agents using WebSocket transport instead of TCP, which simplifies the connectivity at the infrastructure level getting the same result with less complexity. WebSockets are the recommended approach especially in Kubernetes environments where it removes the need to open up and manage nodePort
s in your Kubernertes cluster.
[Recommended approach] Use WebSocket transport agents
Starting in Jenkins core 2.217 the WebSocket feature landed in Jenkins. This improvement provides WebSocket transport agent support to Jenkins, available when connecting inbound agents or when running the CLI. The WebSocket protocol allows bidirectional, streaming communication over an HTTP(S) port.
Since CloudBees Core 2.222.1.1 you can use WebSocket transport to connect inbound agents, and this works as well for shared agents / clouds. Just select the WebSocket checkbox in agent / cloud configuration. No special network configuration is needed, since the regular HTTP(S) port proxied by the CloudBees CI ingress is used for all communications.
The main benefit of WebSocket is that it simplifies the connectivity as you will not need to configure the inbound TCP port in the networking elements in front of Jenkins for the connectivity to happen. Websocket is compatible with HTTP/HTTPS protocol so it uses the Jenkins URL for the communication.
Troubleshooting WebSocket transport agents
If your agent uses WebSocket transport and you are encountering unexpected agent disconnections, you can add a custom logger for the class jenkins.agents.WebSocketAgents
by following Configure Loggers for Jenkins.
WebSocket agent connectivity can also be traced by using a third party tool (if your company approves of usage of this third party tool) from https://github.com/vi/websocat by running the following command, piped to the ts
command (which comes from the Linux moreutils
package) to prefix timestamps to each line of output:
websocat -v --basic-auth "${USER}:${API_TOKEN}" wss://${JENKINS_URL}/wsecho/ 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'
The ${USER} must be a Jenkins administrator to use this command, and for the URL, take the current URL you use to access Jenkins (for example https://JENKINS_URL/ ) and change it to wss://JENKINS_URL/wsecho/ .
|
This command will establish a WebSocket connection, and list the ping and pong messages for the socket, prefixed with timestamps so issues can be traced:
[2023-10-04 15:07:56] [INFO websocat::lints] Auto-inserting the line mode [2023-10-04 15:07:57] [INFO websocat::stdio_threaded_peer] get_stdio_peer (threaded) [2023-10-04 15:07:57] [INFO websocat::ws_client_peer] get_ws_client_peer [2023-10-04 15:07:58] [INFO websocat::ws_client_peer] Connected to ws [2023-10-04 15:08:27] [INFO websocat::ws_peer] Received WebSocket ping [2023-10-04 15:08:57] [INFO websocat::ws_peer] Received WebSocket ping
Output above is from https://github.com/vi/websocat/releases/tag/v1.12.0 with CloudBees CI version 2.414.2.2
.
If you encounter disconnections, you should also review the logs of your current ingress controller, to see if the WebSocket connnection issues are seen in your ingress controller logs (for example NGINX).
The default WebSocket ping interval is 30 seconds, as per this code: https://github.com/jenkinsci/jenkins/blob/5c9976617cd6512c0d265b0e8a0623307f8d40bb/core/src/main/java/jenkins/websocket/WebSocketSession.java#L47-L58 Most ingress controllers expect a ping every 60 seconds in order to consider the connection active, so if you have a different ingress controller timeout, or if your build agents become unresponsive due to heavy build workload and are not able to send the ping in time, intermittent disconnections can happen. If you are encountering intermittent disconnections, you can try to increase the frequency of the ping interval from 30 seconds to 10 seconds by setting the startup option:
-Djenkins.websocket.pingInterval=10
The steps to add this startup option are:
If running in Kubernetes, and you suspect your ingress controller is the issue, you could also try to port-forward the controller using kubectl, and try to connect the agent using the port-forwarding URL:
kubectl port-forward pod/test-0 -n cloudbees 8090:8080 java -jar agent.jar -jnlpUrl http://localhost:8090/test/computer/test/jenkins-agent.jnlp -secret SECRET -workDir "/tmp/agent"
[Alternative approach if WebSocket does not work] Verify required settings for inbound agents
Using WebSocket transport makes it much easier to handle special network topologies including reverse proxies and Kubernetes ingress. But if your agents are on the same network as the controller, TCP inbound agents are an excellent choice and fully supported. In order to successfully connect an inbound agent with your Jenkins environment there are a few important pre-requisites:
-
The CloudBees CI instance must be listening on the TCP port used mainly for non-WebSocket inbound agents
-
The CloudBees CI instance must be reachable at HTTP level from the agent
-
The CloudBees CI instance must be reachable at TCP level from the agent
The CloudBees CI instance must be listening on the inbound (formerly JNLP) port
Go to Manage Jenkins -> Configure Global Security and ensure that the inbound port was configured with an either fixed or random port, and that the Agent protocol Inbound TCP Agent Protocol/4 (TLS encryption)
is at least enabled.
Take a thread dump of the instance going to <JENKINS_URL>/threadDump
from your web browser and look for the TCP agent listener
thread.
TCP agent listener port=31966 "TCP agent listener port=31966" Id=89 Group=main RUNNABLE (in native) at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250) - locked java.lang.Object@7b1b2a2 at hudson.TcpSlaveAgentListener.run(TcpSlaveAgentListener.java:186)
The CloudBees CI instance must be reachable at HTTP level from the agent
From the agent side run the command curl -ILv <JENKINS_URL>
and check if you are getting the Jenkins headers such as:
... < X-Hudson: 1.395 X-Hudson: 1.395 < X-Hudson-CLI-Port: 31966 X-Hudson-CLI-Port: 31966 < X-Jenkins: 2.204.1.3 X-Jenkins: 2.204.1.3 < X-Jenkins-CLI-Host: ec2-74-159-31-69.compute-1.amazonaws.com X-Jenkins-CLI-Host: ec2-74-159-31-69.compute-1.amazonaws.com ...
Ensure that the Java version is at least on the same line on both controller and agent
A good practice is to run the exactly same Java version in both Jenkins and agent, but when this is not possible it is recommended to be running at least the same base line.
Run java -version
in both Jenkins controller machine and agent to check the java version you are running in both.
Ensure that the version of agent.jar matches with the one
The main problem of running an inbound agents as an agent Launcher is that when you upgrade Jenkins agent.jar
is not automatically upgraded on the agent - which by the way happens in SSH Launcher out of the box.
Check that agent.jar
is the same using for example md5sum agent.jar
. agent.jar
can be downloaded from Jenkins controller from the URL below:
<JENKINS_URL>/jnlpJars/agent.jar
Use jenkins-cli to check the connection
In the agent box download <JENKINS_URL>/jnlpJars/jenkins-cli.jar
from Jenkins controller and execute the command below:
java -jar jenkins-cli.jar -s https://<CJOC_URL>/ --username=<USERNAME> --password=<PASSWORD> help
Check that the inbound port and hostname are right
Launch the commands below and check that the port and hostname are the right ones:
curl -I <JENKINS_URL>/computer/<AGENT>/jenkins-agent.jnlp curl -I <JENKINS_URL>/tcpSlaveAgentListener/
The curl
command can be installed from your OS package manager on Linux, or on Windows you can download it from https://curl.se/.
Load balancer or ha-proxy
If you are using a load balancer or a ha-proxy and you are not running Jenkins on ha mode, you might want to bypass any of them through the Agent advance option of Tunnel connection through
.
Clear the Java Web Start Cache
If, when starting the JNLP file, you see an error like the one below, run the command javaws -clearcache
to clear the cache of the java webstart program.
java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at sun.security.ssl.InputRecord.readFully(Unknown Source) at sun.security.ssl.InputRecord.read(Unknown Source) at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) at sun.net.www.protocol.https.HttpsClient.afterConnect(Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.access$200(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection$9.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.security.AccessController.doPrivilegedWithCombiner(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source) at com.sun.deploy.net.HttpUtils.followRedirects(Unknown Source) at com.sun.deploy.net.BasicHttpRequest.doRequest(Unknown Source) at com.sun.deploy.net.BasicHttpRequest.doGetRequestEX(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.checkUpdateAvailable(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.isUpdateAvailable(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source) at com.sun.deploy.cache.ResourceProviderImpl.getResource(Unknown Source) at com.sun.javaws.LaunchDownload$DownloadTask.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
Information to be attached in case you need to open a Support ticket at CloudBees Support
-
Architecture diagram so we can understand how it looks like your environment
-
A support bundle from the controller
-
md5sum
ofagent.jar
on the agent -
Content of
<JENKINS_URL>
/computer/<AGENT>
/config.xml -
The agent and the controller logs which demonstrates that the connectivity is broken
-
Output of commands below launched from agent machine:
curl -I <JENKINS_URL>/tcpSlaveAgentListener/ curl -ILv <JENKINS_URL>