Controlling builds

9 minute read

Restarting aborted builds

CloudBees attempts to bring your instance back up quickly after a crash, power outage, or other system failure that resulted in a loss of environment. If any jobs are in the queue when the crash occurs, they are still present after a restart.

In-progress Freestyle builds cannot be resumed where they left off. Recreating the context of the build is typically too complex. Long-running builds may be used when the context is simple.

Restarting Freestyle builds

If a Freestyle build completes within control of the controller, including build failures, manual aborts, and termination due to a scheduled shutdown, then nothing special is done.

However, after a hard crash, the controller may no longer retain the build record. The CloudBees Restart Aborted Builds Plugin helps to manage exceptional halts like this. It ensures that at least a partial record of every build is saved immediately after it starts and after each configured build step. If the instance is halted suddenly, due to a crash or freeze, the list of all builds that are currently running is recorded. When the instance is restarted, all aborted builds are displayed on an administrative monitor page, where they can be inspected or restarted.

To restart aborted builds:

  1. Ensure you are signed in to the controller as a user with the Administer permission.

  2. When the instance is abruptly terminated after a restart, select Manage Jenkins. If builds are in progress, a warning is displayed.

    Administrative warning
    Figure 1. Administrative warning
  3. Select View details to see a list of all running builds that were interrupted.

    Builds list
    Figure 2. Builds list
  4. Select any build to view build details, such as the changelog or console log up to the break point. If the job was parameterized, the list displays the parameter values for that build.

  5. Select Restart build next to a build. A new build of the same job is scheduled, including any associated parameters. Restarting the build removes the item from the list.

Restarting builds after a restore

CloudBees Restart Aborted Builds Plugin, version 1.17 or later, is required to restart builds after a restore.

When a controller is restored from a backup, the Restart Aborted Builds plugin helps to manage:

  • Aborted Freestyle builds that were interrupted due to a controller crash or sudden halt.

  • Aborted Pipeline builds that were interrupted due to an agent crash or sudden halt, and the Pipeline build could not be automatically recovered.

  • Active Pipeline builds that were running when the backup was taken if the builds are still running at the time the page is displayed.

If a Pipeline build is started after a controller backup is taken, when the controller is restored from the backup, new builds may reuse existing build numbers. This is typically harmless because any deployed artifacts or reports should use unique identifiers based on commit, date, or similar. However, some projects may rely on the uniqueness of the build number. In this scenario, a restore script can be used to set a controller RESTORED_FROM_BACKUP environment variable to any identifier. If this variable is defined when the controller starts and the environment variable is either reset or set to a different value during the last startup, a pluggable set of actions are launched to adapt to the restoration and 1000 is added to the next build number of every job. This ensures subsequent build numbers are unlikely to overlap with builds that may have started after the corresponding backup.

If you are using Velero, use the OSS inject-metadata-velero-plugin to automatically set the RESTORED_FROM_BACKUP environment variable during a restore operation.

For information on using Velero to back up and restore Kubernetes cluster resources, refer to Using Velero to back up and restore Kubernetes cluster resources.

To restart builds after a restore:

  1. Ensure you are signed in to the controller as a user with the Administer permission.

  2. After the controller is restored from a backup, select Manage Jenkins. If builds are in progress, a warning is displayed, indicating the controller is newly restored from backup.

    Administrative warning
    Figure 3. Administrative warning
  3. Select View details to see a list of all running builds that were interrupted.

    Builds list
    Figure 4. Builds list
  4. Select any build to view build details.

    • If a Freestyle build was interrupted due to a crash or sudden halt, you can restart it.

    • If a Pipeline build is paused for input, you may be able to resume the build where it left off without interruption.

Long-running builds

What happens to builds that were running when Jenkins crashes, or is restarted not in "safe" mode (waiting for running builds to complete)? Whether you are using High Availability to start another Jenkins controller, builds of regular projects that were already running will be aborted. The CloudBees Restart Aborted Builds Plugin allows you to find and reschedule them, but for builds of projects which normally take a long time, perhaps hours or even days, this is not enough.

Builds that need to run for an extended period are best implemented with Pipeline and Inserting checkpoints. If builds cannot be implemented with Pipeline and need to run for an extended period and continue running even if the controller restarts, they can be defined as a "long-running build" type.

To address the needs of people who have legacy builds that are too long to interrupt every time a Jenkins agent is reconnected or Jenkins is restarted for a plugin update, CloudBees CI includes the CloudBees Long-Running Build Plugin that provides a "long-running project" type. The configuration is almost the same as for a standard free-style project, with one difference: the part of your build that you want to run apart from Jenkins should be configured as a (Unix) shell or (Windows) batch step. Of course this script could in turn run Maven, Make, or other tools.

If the agent is reconnected or Jenkins restarted during this "detached build" phase, your build keeps on running uninterrupted on the agent machine (so long as that machine is not rebooted of course). When Jenkins makes contact with the agent again, it will continue to show log messages where it left off, and let the build continue. After the main phase is done, you can run the usual post-build steps, such as archiving artifacts or recording JUnit-style test results.

Make a new job and select Long-Running Project and note the Detached Build section. Pick a kind of build step to run—Bourne shell for Unix agents, or batch script for Windows agents—and enter some commands to run in your build’s workspace.

long-running-build-img-config
Figure 5. Long-Running Build Configuration

When the project is built, initially the executor widget will look the same as it would for a freestyle project. During this initial phase, SCM checkouts/updates and similar pre-build steps may be performed. Soon you will see a task in the widget with the (detached) annotation. This means that your main build step is running, and should continue running even if the Jenkins server is halted or loses its connection to the agent. (So long as the connection is open, you should see any new output produced by the detached build step in your build log, with a delay of a few seconds.)

The task label will show post steps while any post-build actions are performed, such as archiving artifacts or recording JUnit test results. This phase does not survive a crash: it requires a constant connection from the Jenkins controller to the agent.

There are a number of limitations and restrictions on what Jenkins features work in long-running builds. Generally speaking, anything that works in a freestyle project in the pre-build or post-build phase should also work. But general build steps are not available except as part of the non-detached pre-build phase, and build wrappers will generally not work. Also surviving a restart can only work if Jenkins can reconnect to the exact same agent without that machine having rebooted, so this will generally not work on cloud-provisioned agents. Consult the release notes for more information.

Skip next build

The Skip Next Build plugin allows you to skip building a job for a short period of time. While you could achieve something similar by disabling the job from the job configure page, you would need to remember to re-enable the job afterwards.

There are two main use cases for this plugin:

  • If you are going to be taking some external resources that the build requires offline for maintenance and you don’t want to see all the build failure notices.

  • If you are merging a major feature branch and you want to prevent builds until after the merge is completed.

The plugin adds a image Skip builds action to all jobs. When a skip is applied to the job, the icon is yellow image and the main job page looks like this:

Figure 6. The main job screen when a skip has been applied
Figure 6. The main job screen when a skip has been applied

When no skip has been applied, the icon is green image.

To apply a skip to a folder or job, select the image Skip builds action. This displays a screen similar to applying a skip to a folder or job:

Figure 7. Applying a skip to a folder / job
Figure 7. Applying a skip to a folder / job

When a skip is applied to a folder, all jobs within the folder will be skipped.

Select the duration of skip to apply and click the Apply skip button. The main job screen should now have a notice that builds are skipped until the specified time (See The main job screen when a skip has been applied for an example)

To remove a skip from a folder or job, select the image Skip builds action.

This should display a screen similar to this:

Figure 8. Removing a skip from a job
Figure 8. Removing a skip from a job

Click on the Remove skip button to remove the skip.

If the skip was not applied directly to the folder / job but instead is either inherited from a parent folder or originating from a skip group then the screen will look something like this:

Figure 9. Trying to remove a skip from a job where the skip is inherited from a parent folder.
Figure 9. Trying to remove a skip from a job where the skip is inherited from a parent folder.

The link(s) in the table of active skips can be used to navigate to the corresponding skip.

Skip groups

Depending on how the jobs in your Jenkins instance have been organized and the reasons for skipping builds, it may be necessary to select a disjoint set of jobs from across the instance for skipping. Skip groups can be used to combine jobs from different folders so that they can be skipped as a single group.

The Jenkins administrator configures skip groups in the global configuration Jenkins  Manage Jenkins  Configure System  Skip groups.

skip groups global config navigate
Figure 10. Navigating to the Jenkins  Manage Jenkins  Configure System  Skip groups section using the breadcrumb bar’s context menu.

Each skip group must have a unique name. It can be helpful to provide a description so that users understand why the skip has been applied to their jobs.

skip groups global config adding
Figure 11. Adding a skip group to the Jenkins global configuration

You can have multiple skip groups.

skip groups global config multiple
Figure 12. Multiple skip groups can be defined

Once skip groups have been defined, you can configure the jobs and/or folder membership from the job / folder configuration screens.

skip groups folder membership
Figure 13. Configuring a folder’s skip group membership

When there is at least one skip group defined in a Jenkins instance, the Jenkins  Skip groups page will be enabled.

skip groups root action
Figure 14. A Jenkins instance with Jenkins  Skip groups enabled
skip groups index
Figure 15. The Jenkins  Skip groups page

To manage the skip state of a skip group, you need to navigate to that skip group’s details page

skip groups details
Figure 16. The details page for a specific skip group

The details page will display the current status of the skip group as well as listing all the items that are directly a member of this skip group.

Where folders are a member of a skip group, the skip group membership will be inherited by all items in the folder.

The Skip Next Build plugin adds two new permissions:

  • Skip: Apply - this permission is required in order to apply a skip to a job. It is implied by the Overall: Administer permission.

  • Skip: Remove - this permission is required in order to remove a skip from a job. It is implied by the Overall: Administer permission.

The Skip Next Build plugin adds two new CLI operations:

  • applySkip - this operation applies a skip to a job. It takes a single parameter which is the number of hours to skip the job for. If the parameter is outside the range 0 to 24 it will be brought to the nearest value within that range.

  • removeSkip - this operation removes any skip that may have been applied to a job.

The Skip Next Build plugin adds a number of Jenkins CLI commands for controlling skips:

apply-skip

Enables the skip setting on a job.

apply skip cli

This command takes two parameters: 1. The full name of the job 2. The number of hours the skip should be active for.

apply-folder-skip

Enables the skip setting on a folder.

apply folder skip cli

This command takes two parameters: 1. The full name of the folder 2. The number of hours the skip should be active for.

skip-group-on

Enables the skip setting on a skip group.

skip group on cli

This command takes two parameters: 1. The name of the skip group 2. The number of hours the skip should be active for.

remove-skip

Removes the currently active skip setting from a job.

apply skip cli

This command takes only one parameter: the full name of the job.

remove-folder-skip

Removes the currently active skip setting from a folder.

remove folder skip cli

This command takes only one parameter: the full name of the folder.

skip-group-off

Removes the currently active skip setting from a skip group.

skip group off cli

This command takes only one parameter: the name of the skip group.