Issue
-
My controller will no longer provision shared agents or shared cloud agents from the operations center
-
On the cperations center continue to lease out shared cloud agents / shared agents
-
When you try to force release a shared agent, another one will be leased
-
The controller does not show any executor available
-
You can not delete the shared cloud / shared agents and add a new one because it is "In Use"
-
The operations center logs show an exception similar to the following:
2020-02-05 06:53:24.224+0000 [id=18029912] WARNING c.c.o.s.p.SlaveLeaseTable#registerRequest: Failed to register request for owner: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with leaseId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx org.h2.jdbc.JdbcSQLException: [...]
Another instance of an error that can cause shared agents to fail is:
2023-02-28 12:16:39.881+0000 [id=27] WARNING c.c.o.s.p.SlaveLeaseTable#getLeases: Failed to get leases in state: AVAILABLE org.h2.jdbc.JdbcSQLNonTransientConnectionException: File corrupted while reading record: null. Possible solution: use the recovery tool [90030-210]
Explanation
The problem may be caused by a corrupted database in the operations center (the database managing the shared agents leases). The stacktrace shown above is evidence of this. In some cases, this could be caused by the file system being unresponsive (when mounting $JENKINS_HOME
in a shared file system for example).
Resolution
This should only be done if no other means can be found to reconnect the shared cloud / agents
-
Stop the controller
-
Stop the operations center
-
Remove the file
$JENKINS_HOME/run-time-state.(h2|mv).db
from the Jenkins operations center -
Restart the operations center
-
Observe that all the agents have disconnected completely from the controller
-
Restart the controller
If there were a lot of items in the queue when the controller was stopped, and the controller has troubles to come back up, try to clear the controller’s queue: |
-
Stop the controller
-
Move / Remove the file
$JENKINS_HOME/queue.xml
-
Start the controller
Disk Contention Scenario
If the message of the JdbcSQLException
reports an IOException
such as the following, this could well be caused by I/O contention on disk:
org.h2.jdbc.JdbcSQLException: IO Exception: "java.io.IOException: Stream Closed"
When the $JENKINS_HOME
of the operations center is mounted in a network file system such as NFS, make sure that the file system is responding and it is a supported NFS version, then restart the operations center.