Symptoms
-
client controller appears as disconnected in operations center
-
client controller shows "Connect to Operations Center" when it should be already connected
-
client controller license expires
-
Shared agents/cloud are not leased to client controllers
Diagnostic/Treatment
-
Pre-condition: client controller and operations center were previously correctly connected.
The following sections show specific paths of resolution for client controller connectivity issues to an operations center. They are linked to specific stack traces but in some cases, the root cause might be hidden under more general traces (like ... Caused by: ... java.io.IOException: Remotely Closed
from the controller logs) which deserves a deeper investigation.
connectivity issue at HTTP level
operations center logs
-
operations center connectivity logs URL:
https://oc.jenkins.example.com:8888/job/exampleClientMaster/log
[Mon Oct 24 12:12:09 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/ [Mon Oct 24 12:12:19 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/ java.net.ConnectException: Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/ at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170) at java.lang.Thread.run(Thread.java:745) at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39) Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/ at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328) at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427) at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418) at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380) at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:140) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/ at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) ... 12 more Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ... 3 more
controller logs
-
controller connectivity log URL:
https://cje.jenkins.example.com:8080/operations-center/log
[Mon Oct 24 14:50:28 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/ [Mon Oct 24 14:50:29 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/ java.net.ConnectException: Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/ at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170) at java.lang.Thread.run(Thread.java:745) at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39) Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/ at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328) at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427) at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418) at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/ at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) ... 12 more Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150) at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ... 3 more
-
A proxy might be configured in the controller and operations center is not added as No Proxy Host
Ensure that if under Manage Jenkins → Manage Plugins → Advanced Tab there is a proxy configured then operations center hostname is added to No Proxy Host section. i.e OC.jenkins.example.com
-
operations center is not reachable from controller at HTTP level
From the controller host, try to curl operations center instance:
The header X-Jenkins should appear on the output: X-Jenkins: 2.7.19.0.1 (CloudBees Jenkins Operations Center 2.7.19.0.1-fixed)
If this header does not appear it means operations center is not reachable from the controller, which means you need to talk to your networking administrator to resolve this issue.
Related KB articles: How to troubleshoot client controller connections
connectivity issue at TCP level
operations center logs
-
operations center connectivity logs URL:
https://oc.jenkins.example.com:8888/job/exampleClientMaster/log
[Mon Oct 24 12:15:57 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/ [Mon Oct 24 12:15:57 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ completed Agent address: oc.jenkins.example.com/192.168.1.44 Agent port: 50000 Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49 [Mon Oct 24 12:15:57 UTC 2016] Trying protocol: OperationsCenter2 [Mon Oct 24 12:15:57 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000 [Mon Oct 24 12:16:07 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49} java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194) at java.lang.Thread.run(Thread.java:745) at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39) [Mon Oct 24 12:16:07 UTC 2016] Sleeping for 10s before retrying
controller logs
-
controller connectivity log URL:
https://cje.jenkins.example.com:8080/operations-center/log
[Mon Oct 24 14:33:08 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/ [Mon Oct 24 14:33:08 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ completed Agent address: oc.jenkins.example.com/192.168.1.44 Agent port: 50000 Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49 [Mon Oct 24 14:33:08 UTC 2016] Trying protocol: OperationsCenter2 [Mon Oct 24 14:33:08 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000 [Mon Oct 24 14:33:18 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49} java.net.SocketTimeoutException at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140) at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194) at java.lang.Thread.run(Thread.java:745) at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
This means there is a connectivity issue at TCP level.
Usually this happens because intermediate elements like haproxy, firewall or ELB are blocking the connection.
CloudBees recommends to use the System Property below in the operations center Java Properties to bypass those intermediate elements.
-Dhudson.TcpSlaveAgentListener.hostName=<MACHINE_HOSTNAME>
If you don’t want to perform a restart, after adding the Java Argument you can test it by running TcpSlaveAgentListener.CLI_HOST_NAME="OC_HOSTNAME"
in your Script Console.
[Tue Feb 14 09:31:36 AEST 2017] Trying protocol: OperationsCenter2 [Tue Feb 14 09:31:36 AEST 2017] Opening TCP socket connection to oc.jenkins.example.com/127.0.0.1 on port 50001 [Tue Feb 14 09:31:46 AEST 2017] Socket connection is closed [Tue Feb 14 09:31:46 AEST 2017] Connection refused: Connection closed before acknowledgement sent com.cloudbees.opscenter.agent.protocol.impl.ConnectionRefusalException: Connection closed before acknowledgement sent at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.onRecvClosed(AckFilterLayer.java:280) at com.cloudbees.opscenter.agent.protocol.FilterLayer.abort(FilterLayer.java:163) at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.access$000(AckFilterLayer.java:43) at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer$1.run(AckFilterLayer.java:176) at com.cloudbees.opscenter.agent.protocol.IOHub$DelayedRunnable.run(IOHub.java:935) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
This issue happens if the JNLP_PORT advertised by operations center is incorrect. Most likely because the System Property -Dhudson.TcpSlaveAgentListener.port=<JNLP_PORT>
is set operations center but the <JNLP_PORT>
points to an application that is not Jenkins.
connectivity issue at TLS level
Log messages
-
Exception in controller logs:
nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init> WARNING: Pre-validation discovery on https://oc.jenkins.example.com:8888/ failed javax.net.ssl.SSLHandshakeException: TLS Handshake exception establishing connection to Jenkins server: https://oc.jenkins.example.com:8888/. You might need to trust server's self-signed certificate on global security configuration. ... Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
-
Message in the controller user interface
The most common problem related with this issue is that you are using a SSL self signed certificate to publish the operations center and controller needs to have installed that certificate in the truststore.
Follow these steps to fix the problem
-
Download the operations center self signed certificate, you have two ways to do it
-
Using the openssl command (change
oc.jenkins.example.com:8888
with the host of your operations center instance and the ssl port configured, e.g.oc.example.com:443
)$ openssl s_client -tls1 -showcerts \ -connect oc.jenkins.example.com:8888 </dev/null 2>/dev/null | \ openssl x509 -outform PEM > oc-certificate.pem
-
Downloading directly from your browser (e.g. Chrome):
-
In the address bar, click the little lock with the X. This will bring up a small information screen. Click the button that says "Certificate Information."
-
Click and drag the image to your desktop and the certificate will be saved on the disk.
-
-
-
Access via SSH to your controller instance and copy the downloaded certificate to a temporal directory
-
Install the certificate in the java cacert
keytool -import -alias oc.jenkins.example.com -file oc-certificate.pem -keystore $JAVA_HOME/jre/lib/security/cacert
connectivity issue at TLS Hostname verification
Log messages
-
Exception in controller logs:
nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init> WARNING: Pre-validation discovery on https://oc.local:8443/ failed javax.net.ssl.SSLException: TLS hostname verification failure establishing connection to Jenkins server: https://oc.local:8443/ Certificate subject: CN=another.local issuer: CN=another.local at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:415) at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation.<init>(OperationsCenterRegistrar.java:500) at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$DescriptorImpl.doPushRegistration(OperationsCenterRegistrar.java:316) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
-
Message in the controller user interface
This message occurs when you are using in your operations center instance a self-signed certificate issued for another host. In the previous case the certificate was issued for another.local
and the operations center is running on oc.local
.
The solution will be create a new self-signed certificate for oc.local
and use it to run the operations center:
-
Create a new self-signed certificate
keytool -genkey -keyalg RSA -alias oc.local -keystore oc.local.jks -storepass jenkins -dname "cn=oc.local"
-
Run operations center using the new self-signed certificate
-
Install the new certificate in controller, see
connectivity issue at TLS level
Notes
Operations Center Agent (currently 2.32.0.1 latest at time of writing) does not support TLS SNI
-
Exception in controller logs:
WARNING: Pre-validation discovery on https://oc.jenkins.example.com:8888/ failed javax.net.ssl.SSLHandshakeException: TLS Handshake exception establishing connection to Jenkins server: https://oc.jenkins.example.com:8888/. You might need to trust server's self-signed certificate on global security configuration. at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:530) at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation.<init>(OperationsCenterRegistrar.java:500) at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$DescriptorImpl.doPushRegistration(OperationsCenterRegistrar.java:316) [...] Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: General SSLEngine problem to https://oc.jenkins.example.com:8888/instance-identity/
-
Message in the controller user interface
To workaround the problem you could use the certificate controller is expecting as the default one in the operations center reverse proxy side. This should work as long as it is the default one.