Operations Center-client controller connectivity issues

5 minute read

Symptoms

  • client controller appears as disconnected in operations center

  • client controller shows "Connect to Operations Center" when it should be already connected

  • client controller license expires

  • Shared agents/cloud are not leased to client controllers

Diagnostic/Treatment

  • Pre-condition: client controller and operations center were previously correctly connected.

The following sections show specific paths of resolution for client controller connectivity issues to an operations center. They are linked to specific stack traces but in some cases, the root cause might be hidden under more general traces (like ... Caused by: ... java.io.IOException: Remotely Closed from the controller logs) which deserves a deeper investigation.

connectivity issue at HTTP level

operations center logs

  • operations center connectivity logs URL: https://oc.jenkins.example.com:8888/job/exampleClientMaster/log

[Mon Oct 24 12:12:09 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/
[Mon Oct 24 12:12:19 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/
java.net.ConnectException: Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/
    at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556)
    at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170)
    at java.lang.Thread.run(Thread.java:745)
    at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328)
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
    at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:140)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
    ... 12 more
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: oc.jenkins.example.com/192.168.1.44:8888
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    ... 3 more

controller logs

  • controller connectivity log URL: https://cje.jenkins.example.com:8080/operations-center/log

[Mon Oct 24 14:50:28 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/
[Mon Oct 24 14:50:29 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ failed (will retry) - Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/
java.net.ConnectException: Could not connect to Jenkins server: https://oc.jenkins.example.com:8888/
    at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:556)
    at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:170)
    at java.lang.Thread.run(Thread.java:745)
    at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:328)
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
    at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:418)
    at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888 to https://oc.jenkins.example.com:8888/instance-identity/
    at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104)
    ... 12 more
Caused by: java.net.ConnectException: Connection refused: oc.jenkins.example.com/192.168.1.44:8888
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
    ... 3 more
  • A proxy might be configured in the controller and operations center is not added as No Proxy Host

Ensure that if under Manage Jenkins → Manage Plugins → Advanced Tab there is a proxy configured then operations center hostname is added to No Proxy Host section. i.e OC.jenkins.example.com

  • operations center is not reachable from controller at HTTP level

From the controller host, try to curl operations center instance:

The header X-Jenkins should appear on the output: X-Jenkins: 2.7.19.0.1 (CloudBees Jenkins Operations Center 2.7.19.0.1-fixed)

If this header does not appear it means operations center is not reachable from the controller, which means you need to talk to your networking administrator to resolve this issue.

connectivity issue at TCP level

operations center logs

  • operations center connectivity logs URL: https://oc.jenkins.example.com:8888/job/exampleClientMaster/log

[Mon Oct 24 12:15:57 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/
[Mon Oct 24 12:15:57 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ completed
 Agent address: oc.jenkins.example.com/192.168.1.44
 Agent port:  50000
 Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49
[Mon Oct 24 12:15:57 UTC 2016] Trying protocol: OperationsCenter2
[Mon Oct 24 12:15:57 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000

[Mon Oct 24 12:16:07 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49}
java.net.SocketTimeoutException
   at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
   at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117)
   at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140)
   at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194)
   at java.lang.Thread.run(Thread.java:745)
   at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)
[Mon Oct 24 12:16:07 UTC 2016] Sleeping for 10s before retrying

controller logs

  • controller connectivity log URL: https://cje.jenkins.example.com:8080/operations-center/log

[Mon Oct 24 14:33:08 UTC 2016] Starting discovery on https://oc.jenkins.example.com:8888/
[Mon Oct 24 14:33:08 UTC 2016] Discovery on https://oc.jenkins.example.com:8888/ completed
Agent address: oc.jenkins.example.com/192.168.1.44
Agent port:  50000
Identity: 99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49
[Mon Oct 24 14:33:08 UTC 2016] Trying protocol: OperationsCenter2
[Mon Oct 24 14:33:08 UTC 2016] Opening TCP socket connection to oc.jenkins.example.com/192.168.1.44 on port 50000
[Mon Oct 24 14:33:18 UTC 2016] Error trying to establish connection to AgentProtocolEndpoint{address=oc.jenkins.example.com/192.168.1.44:50000, publicKey=99:e1:56:84:ad:62:80:7e:b1:b8:33:37:72:59:37:49}
java.net.SocketTimeoutException
  at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
  at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connect(OperationsCenterConnectorSetTask.java:117)
  at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.connectOnce(OperationsCenterConnectorSetTask.java:140)
  at com.cloudbees.opscenter.agent.OperationsCenterConnectorSetTask.run(OperationsCenterConnectorSetTask.java:194)
  at java.lang.Thread.run(Thread.java:745)
  at com.cloudbees.opscenter.client.plugin.AgentThread.run(AgentThread.java:39)

This means there is a connectivity issue at TCP level.

Usually this happens because intermediate elements like haproxy, firewall or ELB are blocking the connection.

CloudBees recommends to use the System Property below in the operations center Java Properties to bypass those intermediate elements.

-Dhudson.TcpSlaveAgentListener.hostName=<MACHINE_HOSTNAME>

If you don’t want to perform a restart, after adding the Java Argument you can test it by running TcpSlaveAgentListener.CLI_HOST_NAME="OC_HOSTNAME" in your Script Console.

[Tue Feb 14 09:31:36 AEST 2017] Trying protocol: OperationsCenter2
[Tue Feb 14 09:31:36 AEST 2017] Opening TCP socket connection to oc.jenkins.example.com/127.0.0.1 on port 50001
[Tue Feb 14 09:31:46 AEST 2017] Socket connection is closed
[Tue Feb 14 09:31:46 AEST 2017] Connection refused: Connection closed before acknowledgement sent
com.cloudbees.opscenter.agent.protocol.impl.ConnectionRefusalException: Connection closed before acknowledgement sent
	at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.onRecvClosed(AckFilterLayer.java:280)
	at com.cloudbees.opscenter.agent.protocol.FilterLayer.abort(FilterLayer.java:163)
	at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer.access$000(AckFilterLayer.java:43)
	at com.cloudbees.opscenter.agent.protocol.impl.AckFilterLayer$1.run(AckFilterLayer.java:176)
	at com.cloudbees.opscenter.agent.protocol.IOHub$DelayedRunnable.run(IOHub.java:935)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

This issue happens if the JNLP_PORT advertised by operations center is incorrect. Most likely because the System Property -Dhudson.TcpSlaveAgentListener.port=<JNLP_PORT> is set operations center but the <JNLP_PORT> points to an application that is not Jenkins.

connectivity issue at TLS level

Log messages

  • Exception in controller logs:

 nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init>
 WARNING: Pre-validation discovery on https://oc.jenkins.example.com:8888/ failed
 javax.net.ssl.SSLHandshakeException: TLS Handshake exception establishing connection to Jenkins server: https://oc.jenkins.example.com:8888/. You might need to trust server's self-signed certificate on global security configuration.
 ...
 Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
    at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
    at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
    at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
  • Message in the controller user interface

TLSHandshakeException.png

The most common problem related with this issue is that you are using a SSL self signed certificate to publish the operations center and controller needs to have installed that certificate in the truststore.

Follow these steps to fix the problem

  1. Download the operations center self signed certificate, you have two ways to do it

    • Using the openssl command (change oc.jenkins.example.com:8888 with the host of your operations center instance and the ssl port configured, e.g. oc.example.com:443)

      $ openssl s_client -tls1 -showcerts \ -connect oc.jenkins.example.com:8888 </dev/null 2>/dev/null | \ openssl x509 -outform PEM > oc-certificate.pem
    • Downloading directly from your browser (e.g. Chrome):

      • In the address bar, click the little lock with the X. This will bring up a small information screen. Click the button that says "Certificate Information."

      • Click and drag the image to your desktop and the certificate will be saved on the disk.

  2. Access via SSH to your controller instance and copy the downloaded certificate to a temporal directory

  3. Install the certificate in the java cacert

    keytool -import -alias oc.jenkins.example.com -file oc-certificate.pem -keystore $JAVA_HOME/jre/lib/security/cacert

connectivity issue at TLS Hostname verification

Log messages

  • Exception in controller logs:

nov 15, 2016 2:52:03 PM com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation <init>
WARNING: Pre-validation discovery on https://oc.local:8443/ failed
javax.net.ssl.SSLException: TLS hostname verification failure establishing connection to Jenkins server: https://oc.local:8443/ Certificate subject: CN=another.local issuer: CN=another.local
	at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:415)
	at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation.<init>(OperationsCenterRegistrar.java:500)
	at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$DescriptorImpl.doPushRegistration(OperationsCenterRegistrar.java:316)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  • Message in the controller user interface

TLSHostnameVerification.png

This message occurs when you are using in your operations center instance a self-signed certificate issued for another host. In the previous case the certificate was issued for another.local and the operations center is running on oc.local.

The solution will be create a new self-signed certificate for oc.local and use it to run the operations center:

  1. Create a new self-signed certificate

    keytool -genkey -keyalg RSA -alias oc.local -keystore oc.local.jks -storepass jenkins -dname "cn=oc.local"

  2. Run operations center using the new self-signed certificate

  3. Install the new certificate in controller, see connectivity issue at TLS level

Notes

Operations Center Agent (currently 2.32.0.1 latest at time of writing) does not support TLS SNI

  • Exception in controller logs:

WARNING: Pre-validation discovery on https://oc.jenkins.example.com:8888/ failed
javax.net.ssl.SSLHandshakeException: TLS Handshake exception establishing connection to Jenkins server: https://oc.jenkins.example.com:8888/. You might need to trust server's self-signed certificate on global security configuration.
	at com.cloudbees.opscenter.agent.AgentProtocolEndpointLocator.locate(AgentProtocolEndpointLocator.java:530)
	at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$PushRegistrationConfirmation.<init>(OperationsCenterRegistrar.java:500)
	at com.cloudbees.opscenter.client.plugin.OperationsCenterRegistrar$DescriptorImpl.doPushRegistration(OperationsCenterRegistrar.java:316)
	[...]
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: General SSLEngine problem to https://oc.jenkins.example.com:8888/instance-identity/
  • Message in the controller user interface

cjoc-ssl-connectivity-problem.png

To workaround the problem you could use the certificate controller is expecting as the default one in the operations center reverse proxy side. This should work as long as it is the default one.