KBEC-00241 - Jobs are stuck after the CloudBees CD (CloudBees Flow) server is brought back up

Article ID:360032829212
1 minute readKnowledge base
On this page

Summary

The CloudBees CD (CloudBees Flow) server went down (e.g., your database crashed) and stayed down for more than 24 hours. After the CloudBees CD (CloudBees Flow) server is brought up, you notice that jobs that were running before the crash are now stuck (no timeouts were set).

If the server is down, the agent (where the jobs were running) will retry with successively longer pauses (up to 30 seconds) for up to 24 hours before presuming the server is dead and dropping the message.

Solution

You can do one of the following:

  • Manually abort the job

  • Restart the agent with the stuck jobs

If you restart the agent, the server will realize the agent restarted (the next time the server tries to run a command on the agent, or if the server pings the agent), and the CloudBees CD (CloudBees Flow) server will abort the running steps. This is because the agent restart is conclusive evidence that running steps from the prior agent life are no longer running.

Additionally, you can change the 24 hour limit by using the --retryTimeout global server option to change the timeout for a specific API call.

ectool --retryTimeout

Amount of time to continue retrying requests that fail due to communication errors. Defaults to --timeout value unless running in a job step, in which case the default is 24 hours.

This article is part of our Knowledge Base and is provided for guidance-based purposes only. The solutions or workarounds described here are not officially supported by CloudBees and may not be applicable in all environments. Use at your own discretion, and test changes in a safe environment before applying them to production systems.