After restoring a controller, the controller is not responsive and you can see continuous errors in the logs as the ones shown below:
[Warning][Pod][controller-0][Unhealthy] Readiness probe failed: Get https://controller_url/login: dial tcp XXXX : connect: connection refused ... ... [Warning][Pod][controller-0][Unhealthy] Liveness probe failed: HTTP probe failed with statuscode: 503
After several attempts, you manage to get the controller up and running.
When restoring a controller, you have to consider that the startup should be slower than usual as this process will include the restoration of the backup. While the data is being restored, the controller is not accessible, and that will cause any liveness probe configured for the controller to fail and eventually force a restart event.
Considering the elements mentioned above, you will need to increase the elements related to the liveness probe in a way that the values selected exceed the restoration time. Thus the values needed would depend on the size of the backup and the resources allocated to the controller itself.
Please find below some reference values:
Health Check Initial Delay: Which is the "Number of seconds after the container has started before liveness probes are initiated." from 600 (10 minutes) to 1800 (30 minutes).
Readiness Initial Delay: Which is the "Number of seconds after the container has started before readiness probes are initiated."from 30 (30 seconds) to 1800 (30 minutes).
You might also consider increasing the resource allocation for the controller to make the restore operation faster. You should then increase the Memory and CPU.
CloudBees CI (CloudBees Core) 126.96.36.199
You can also review How to Troubleshoot and Address Liveness / Readiness probe failure to get additional details on how to approach this issue in a more holistic way.