Restarting aborted builds

4 minute read

CloudBees attempts to bring your instance back up quickly after a crash, power outage, or other system failure that resulted in a loss of environment. If any jobs are in the queue when the crash occurs, they are still present after a restart.

Restarting Freestyle builds

If a Freestyle build completes within control of the controller, including build failures, manual aborts, and termination due to a scheduled shutdown, then nothing special is done.

However, after a hard crash, the controller may no longer retain the build record. The CloudBees Restart Aborted Builds plugin helps to manage exceptional halts like this. It ensures that at least a partial record of every build is saved immediately after it starts and after each configured build step. If the instance is halted suddenly, due to a crash or freeze, the list of all builds that are currently running is recorded. When the instance is restarted, all aborted builds are displayed on an administrative monitor page, where they can be inspected or restarted.

To restart aborted builds:

  1. Ensure you are signed in to the controller as a user with the Administer permission.

  2. When the instance is abruptly terminated after a restart, select Manage Jenkins. If builds are in progress, a warning is displayed.

    Administrative warning
    Figure 1. Administrative warning
  3. Select View details to see a list of all running builds that were interrupted.

    Builds list
    Figure 2. Builds list
  4. Select any build to view build details, such as the changelog or console log up to the break point. If the job was parameterized, the list displays the parameter values for that build.

  5. Select Restart build next to a build. A new build of the same job is scheduled, including any associated parameters. Restarting the build removes the item from the list.

Restarting builds after a restore

CloudBees Restart Aborted Builds plugin, version 1.17 or later, is required to restart builds after a restore.

When a controller is restored from a backup, the Restart Aborted Builds plugin helps to manage:

  • Aborted Freestyle builds that were interrupted due to a controller crash or sudden halt.

  • Aborted Pipeline builds that were interrupted due to an agent crash or sudden halt, and the Pipeline build could not be automatically recovered.

  • Active Pipeline builds that were running when the backup was taken if the builds are still running at the time the page is displayed.

If a Pipeline build is started after a controller backup is taken, when the controller is restored from the backup, new builds may reuse existing build numbers. This is typically harmless because any deployed artifacts or reports should use unique identifiers based on commit, date, or similar. However, some projects may rely on the uniqueness of the build number. In this scenario, a restore script can be used to set a controller RESTORED_FROM_BACKUP environment variable to any identifier. If this variable is defined when the controller starts and the environment variable is either reset or set to a different value during the last startup, a pluggable set of actions are launched to adapt to the restoration and 1000 is added to the next build number of every job. This ensures subsequent build numbers are unlikely to overlap with builds that may have started after the corresponding backup.

To restart builds after a restore:

  1. Ensure you are signed in to the controller as a user with the Administer permission.

  2. After the controller is restored from a backup, select Manage Jenkins. If builds are in progress, a warning is displayed, indicating the controller is newly restored from backup.

    Administrative warning
    Figure 3. Administrative warning
  3. Select View details to see a list of all running builds that were interrupted.

    Builds list
    Figure 4. Builds list
  4. Select any build to view build details.

    • If a Freestyle build was interrupted due to a crash or sudden halt, you can restart it.

    • If a Pipeline build is paused for input, you may be able to resume the build where it left off without interruption.

Restarting pipeline node blocks using the retry option

Retry is a mechanism that allows you to automatically restart the node block within a Pipeline build. If a controller or agent fails within a Pipeline, the retry option recovers data and restarts that piece of the build without interrupting the flow of the build or requiring user intervention. Retries can occur when:

  • Agents running on virtual machines are abruptly preempted.

  • Builds are aborted due to loss of the agent infrastructure.

  • There is a network timeout.

Retries do not occur when sh scripts exit with nonzero code or when a build is aborted or superseded.

You can configure an automatic retry for both Declarative and Scripted Pipelines.

Configuring retry for Declarative Pipelines

You can configure the retry option for Declarative Pipelines within the Pipeline Syntax.

To configure automatic retry for Declarative Pipelines:

  1. Select a controller from the Dashboard. The controller page displays.

  2. Select Pipeline Syntax from the left navigation menu. The Pipeline Syntax page displays.

    Configure Retry Count
    Figure 5. Configure retry count setting
  3. Select Declarative Directive Generator from the left navigation menu.

  4. Select agent:Agent from the Sample Directive list.

  5. Select label: Run on an agent matching a label from the Agent dropdown list.

  6. Enter a number in the Retry Count field. The number must be over 1.

  7. Select Generate Declarative Directive. The Pipeline code displays in the box below the button.

    Pipeline code with retry values
    Figure 6. Pipeline code with retry values

    You can copy that code directly into the Pipeline block of your Jenkinsfile for top-level directives or into a stage block for stage directives.

Configuring retry for Scripted Pipelines

To configure retry for a Scripted Pipeline, wrap any node block in a retry block and specify conditions of the nonresumable step (a controller restarted during a pipeline step that cannot resume cleanly). Then specify either an agent (agent connection errors) or kubernetesAgent (errors related to Kubernetes pod agent behaviors). The following is an example of the scripted pipeline:

podTemplate(/*…as usual…*/) { retry(count: 2, conditions: [kubernetesAgent(), nonresumable()]) { node(POD_LABEL) { // … as usual } } }