Websocket Inbound Agents disconnect intermittently due to "WebSocketTimeoutException: Connection Idle Timeout"

2 minute readKnowledge base

Issue

  • Past version 2.361.x, websocket agents are being disconnected intermittently. The controller shows exceptions like the following:

org.eclipse.jetty.websocket.api.exceptions.WebSocketTimeoutException: Connection Idle Timeout
        [...]
        at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.eclipse.jetty.websocket.core.exception.WebSocketTimeoutException: Connection Idle Timeout

Environment

Explanation

In 2.361.1, Jenkins uses Jetty 10 that went through a significant rewrite of the Websocket connection handling. In particular, Websocket connection managed by Jetty are subjected to an idle timeout of 30 seconds (as opposed to an idle timeout of 5 minutes in Jetty 9). Given that the websocket connections of inbound agent are kept active by a server ping sent with a 30 seconds by default, this gives little room for errors caused by JVM performance or network latency. It is very likely that agent websocket connection be closed over time. Therefore this is a configuration issue introduced in 2.361.1, the ping interval must guarantee that websocket connections are kept alive and therefore be inferior to the idle timeout.

What is more is that several issues related to the handling of websocket channel closure and reconnection have been discovered since. Most likely related to the fact that websocket agents are being disconnected more often due to JENKINS-69955: WebSocketTimeoutException: Connection Idle Timeout:

Resolution

This problem is fixed in Jenkins weekly 2.395 and backported in Jenkins LTS 2.387.2.

The solution is to upgrade CloudBees CI to version 2.387.2.3 or later.

Workaround

The workaround if to set the ping interval to 15 seconds or lower by adding the system property -Djenkins.websocket.pingInterval=15 to the controller JVM.

For more information on how to add startup arguments, refer to How to add Java arguments to Jenkins.