My controller will not get new Shared Agents / Shared Cloud Agents

Article ID:220250088
2 minute readKnowledge base

Issue

  • My controller will no longer provision shared agents or shared cloud agents from the operations center

  • On the cperations center continue to lease out shared cloud agents / shared agents

  • When you try to force release a shared agent, another one will be leased

  • The controller does not show any executor available

  • You can not delete the shared cloud / shared agents and add a new one because it is "In Use"

  • The operations center logs show an exception similar to the following:

2020-02-05 06:53:24.224+0000 [id=18029912]  WARNING c.c.o.s.p.SlaveLeaseTable#registerRequest: Failed to register request for owner: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with leaseId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
org.h2.jdbc.JdbcSQLException: [...]

Another instance of an error that can cause shared agents to fail is:

2023-02-28 12:16:39.881+0000 [id=27]    WARNING c.c.o.s.p.SlaveLeaseTable#getLeases: Failed to get leases in state: AVAILABLE
org.h2.jdbc.JdbcSQLNonTransientConnectionException: File corrupted while reading record: null. Possible solution: use the recovery tool [90030-210]

Explanation

The problem may be caused by a corrupted database in the operations center (the database managing the shared agents leases). The stacktrace shown above is evidence of this. In some cases, this could be caused by the file system being unresponsive (when mounting $JENKINS_HOME in a shared file system for example).

Resolution

This should only be done if no other means can be found to reconnect the shared cloud / agents

  1. Stop the controller

  2. Stop the operations center

  3. Remove the file $JENKINS_HOME/run-time-state.(h2|mv).db from the Jenkins operations center

  4. Restart the operations center

  5. Observe that all the agents have disconnected completely from the controller

  6. Restart the controller

If there were a lot of items in the queue when the controller was stopped, and the controller has troubles to come back up, try to clear the controller’s queue:
  • Stop the controller

  • Move / Remove the file $JENKINS_HOME/queue.xml

  • Start the controller

Disk Contention Scenario

If the message of the JdbcSQLException reports an IOException such as the following, this could well be caused by I/O contention on disk:

org.h2.jdbc.JdbcSQLException: IO Exception: "java.io.IOException: Stream Closed"

When the $JENKINS_HOME of the operations center is mounted in a network file system such as NFS, make sure that the file system is responding and it is a supported NFS version, then restart the operations center.