Kubernetes agents are failing with 'SocketTimeoutException: timeout'

Article ID:360038066251
2 minute readKnowledge base

Issue

  • My pods are getting created but some builds are failing or getting disconnected with an error similar to the following in the console output or controller logs:

java.net.SocketException: Socket closed
    at java.net.SocketInputStream.read(SocketInputStream.java:204)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
    at sun.security.ssl.InputRecord.read(InputRecord.java:503)
    at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
    at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
    at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
    at okio.Okio$2.read(Okio.java:140)
    at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
Caused: java.net.SocketTimeoutException: timeout
    at okio.Okio$4.newTimeoutException(Okio.java:232)
    at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
    at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
    at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354)
    at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226)

Explanation

The exception java.net.SocketTimeoutException: timeout is caused by the read (or request) timeout being exceeded during the connection between the Jenkins controller and a kubernetes agent. This timeout applies after the connection has been established. It is set to 15s by the kubernetes plugin by default.

Before Kubernetes plugin version 1.22.3, a value of 0 results in a Read Timeout of 10s: no timeout is explicitly set to the kubernetes client and the default timeout of the okhttp client is used.

Since Kubernetes plugin version 1.22.3, the minimum value possible for the Read Timeout is 15s.

Resolution

If an instance is impacted by this problem, consider increasing the Read Timeout in the Kubernetes Cloud configuration.

CloudBees CI on Modern Platform

Note: The kubernetes plugin must first be upgraded to version 1.22.4 (CloudBees CI 2.204.2.2 or later). An issue in the kubernetes plugin prior to version 1.22.3 prevents from passing the configured value of the Read Timeout of a kubernetes shared cloud to the connected controllers. The impact is that the kubernetes cloud configuration that synchronizes across the connected controllers has a value of 0 for the Read Timeout field, that results in a timeout of 10s and it cannot be changed.

If using CloudBees CI on Modern Platform, this can be done from the Operations Center.

In the Operations Center, select the "All" view and configure the item "kubernetes shared cloud". Then adjust the value of the Read Timeout. Once saved, it may take a few seconds for the change to be applied to all managed controllers.

Any Controller

The kubernetes plugin must first be upgraded to version 1.14.9 or later (CloudBees CI 2.164.3.2 or later). An issue in the kubernetes plugin prior to version 1.14.9 prevents from persisting the configured value of the read timeout.

Go to Manage Jenkins  Configure System  Cloud and adjust the Read Timeout of the Kubernetes cloud configuration.