Pipeline best practices

9 minute read

This guide describes best practices for developing Pipelines . It points Pipeline authors and maintainers towards practices that result in better Pipeline execution. This guide is not meant to be an exhaustive list of all possible Pipeline best practices but instead it provides a number of specific useful examples.

Best practices overview

The following list provides a generic guidance on the best practices for creating a Pipeline.

  • Keep it simple. Use the minimum amount of code to connect the Pipeline steps and integrate tools. Delegate most activity to agents and reduce the load on controllers.

  • Use external scripts and tools for most tasks, especially those involving complex processing or processing that uses a large amount of CPU.

  • Use command-line tools for operations such as:

    • Processing data

    • Communicating interactively with REST APIs

    • Parsing/templating larger XML or JSON files

    • Nontrivial integration with external APIs

    • Simulations and complex calculations

    • Business logic

  • Use command-line clients for APIs such as Java and Python. Use sh or bat steps to integrate these tools.

  • Use steps from Jenkins plugins, especially for source control, artifact management, deployment systems and system automation.

  • Reduce the number of steps in the Pipeline. Most well-formed Pipelines contain less than 300 steps. The most effective way to reduce the number of steps in a Pipeline is to consolidate several sequential sh or bat steps into a single, external helper script that the Pipeline calls as a single step.

  • Manage log data. Consider writing log data to an external file on the agent, then compress it and archive it as a build artifact. This can improve Pipeline performance although it means that you must unload and uncompress artifacts to understand a failure.

Use Groovy code to connect sets of actions

Use Groovy code to connect a set of actions rather than as the main functionality of your Pipeline. Instead of relying on Pipeline functionality (Groovy or Pipeline steps) to drive the build process forward, use single steps (such as sh) to accomplish multiple parts of the build. This will help in keeping the Pipelines simple which is desirable as Pipelines require more resources (CPU, memory, storage) on the controller as their complexity increases (the amount of Groovy code, number of steps used, etc.). For example, a good approach would be to use a single call to mvn in a sh step to drive the build through its build, test, and, deploy process.

Example of using Maven to a set of actions

The withMaven step configures a maven environment to use within a Pipeline job by calling sh "mvn …​" or bat ​`"mvn …". The selected maven installation is configured and prepended to the path.

Reduce the amount of Groovy code executed by Pipelines

For a Pipeline, Groovy code always executes on a controller which means using controller resources (memory and CPU), Therefore, it is critically important to reduce the amount of Groovy code executed by Pipelines. This includes any methods called on classes imported in Pipelines.

Examples of Groovy methods to avoid

  • JsonSlurper: This method can be used to read from a file on disk, parse the data from that file into a JSON object, and inject that object into a Pipeline using a command like JsonSlurper().parseText(readFile("$LOCAL_FILE")). This command loads the local file into memory on the controller twice and if the file is very large or the command is executed frequently, will require a lot of memory.

    Instead of using JsonSlurper, use a shell step and return the standard output. This uses agent resources to read the file and the $PARSING_QUERY helps parse down the file into a smaller size.

    This shell would look something like this:

    def json = sh returnStdout: true, script: 'jq "$PARSING_QUERY" "$LOCAL_FILE"'
  • HttpRequest: This method is used to grab data from an external source and store it in a variable. This practice is not ideal because not only is that request coming directly from the controller (which could give incorrect results for things like HTTPS requests if the controller does not have certificates loaded), but also the response to that request is stored twice.

    Instead of using HttpRequest:, use a shell step to perform the HTTP request from the agent. For example: Using a tool like curl or wget, as appropriate, try to filter the result on the agent side as much as possible so that only the minimum required information is transmitted back to the Jenkins controller.

Combine Pipeline steps into single steps

Combine Pipeline steps into single steps as often as possible to reduce the amount of overhead caused by the Pipeline execution engine itself. For example, if you run three shell steps back-to-back, each of those steps has to be started and stopped, requiring connections and resources on the agent and controller to be created and cleaned up. If you put all of the commands into a single shell step, then only a single step needs to be started and stopped.

Instead of creating a series of echo or sh steps, combine them into a single step or script.

Create small, variable files instead of large, global variable declaration files

Having large variable declaration files can require large amounts of memory for little to no benefit as the file is loaded for every Pipeline whether the variables are needed or not.

Create small variable files that contain only variables relevant to the current execution.

Build in distinct containers instead of working across multiple workspaces

  • Build in distinct containers, which create needed resources from scratch. cloud-type agents are recommended to achieve this. Building these containers will ensure that the build process begins at the start every time and is easily repeatable.

    If building containers does not work, disable concurrency on the Pipeline or use the Lockable Resources plugin to lock the workspace when it is running so that no other builds can use it while it is locked.
  • Try not to share workspaces across multiple Pipeline executions or multiple distinct Pipelines. This practice can lead to either unexpected file modification within each Pipeline or workspace renaming.

  • Mount shared volumes or disks in a separate location. Copy the files from that location to the current workspace. Once the build completes, these files can be copied back if they have been changed.

    Disabling concurrency or locking the workspace when it is running can cause Pipelines to become blocked when waiting on resources if those resources are arbitrarily locked.

Store Pipeline definitions in a Source Code Management (SCM) tool

It is a best practice to store the job definition within a SCM, such as Git. If you store the job definition within the job, you could make a change to the job that has unintended side effects.

Avoid defining a pipeline within a Pipeline job.

By storing the job definition in a SCM and enforcing a pull request development flow, you get an audit trail of all of your changes. If you are using a modern Git provider, you can also collaborate with other people on your team using extended comments to discuss the changes prior to the changes being merged for use in production. Placing the Jenkinsfile within the same repository as your source code is beneficial because the maintainers of the code also have the ability to maintain the process that builds, delivers, and deploys the code.

When writing a pipeline definition, use Declarative syntax

  • Scripted and Declarative syntax are meant to implement CI tasks and not to be a general purpose programming language.

  • Many Jenkins controller performance issues can be traced back to the misuse of scripted syntax and shared libraries written in a way where all the work is being done within the Jenkins controller instead of the agents.

Declarative syntax for creating Pipelines has features such as matrix, that are only available for Declarative syntax. Using this syntax provides access to these features.

Use shared libraries

When you introduce a script tag into a Declarative Pipeline, it is a warning sign that you are starting to head down the path to Scripted syntax. To avoid that path, create a custom step in a shared library and use that step within your Declarative Pipeline instead.

Only use Scripted syntax when it doesn’t make sense to use Declarative syntax and a shared library

There are certain situations where Scripted syntax can be your friend. For example, if you have a job that has access to numerous machines that could be available as an agent allowing you to run your job in parallel. This job needs to ascertain if that machine is currently available as an agent or not. A declarative Pipeline does not allow you to do that, but it can be done using a scripted Pipeline.

  • Always start with a declarative Pipeline.

  • Instead of using the script tag in a Declarative Pipeline, add in a shared library.

  • If you cannot achieve your desired output using a declarative Pipeline using shared libraries, use scripted Pipeline.

Do all the work within an agent

  • Any processing within a Pipeline should occur within an agent.

  • By default, the Jenkinsfile script runs on the Jenkins controller, using a lightweight executor which is a Java thread that running on Jenkins controller node. Jenkins automatically creates these executors when needed. These type of executors use very few resources.

  • Processes like cloning code from a Git server or compiling a Java application, should leverage Jenkins distributed builds capability and run on an agent.

The right way to do “with” within pipeline is: sh 'bunch of work' or bat 'bunch of work'. The wrong way is to create loops and control structures, read directly from external database, and write "code" to make business decisions.

Create your inputs outside your agents

While you can put an input statement within a stage that is within an agent, this is not a recommended practice as the input element pauses Pipeline execution to wait for an approval and these approvals could take some time. An agent, on the other hand, acquires and holds a lock on a workspace and a heavyweight Jenkins executor to hold onto while pausing for input.

Example

pipeline { agent none stages { stage('Example Build') { agent { label "linux" } steps { sh 'echo Hello World' } } stage('Ready to Deploy') { steps { input(message: "Deploy to production?") } } stage('Example Deploy') { agent { label "linux" } steps { sh 'echo Deploying' } } } }

Wrap your input in a timeout

Pipeline has an easy mechanism for timing out any given step of your Pipeline. As a best practice, you should always plan for timeouts around your inputs for healthy cleanup of the Pipeline. Wrapping your inputs in a timeout will allow them to be cleaned up if approvals don’t occur within a given window.

Example

pipeline { agent none stages { stage('Example Build') { agent { label "linux" } steps { sh 'echo Hello World' } } stage('Ready to Deploy') { options { timeout(time: 1, unit: 'MINUTES') } steps { input(message: "Deploy to production?") } } stage('Example Deploy') { agent { label "linux" } steps { sh 'echo Deploying' } } } }

Acquire agents within parallel steps

One of the main benefits of parallelism in a Pipeline is to increase the processing capacity. You should aim to acquire an agent within the parallel branches of your Pipeline.

Avoid script security exceptions

Avoid the need for scripts needing approval, as they cause exceptions.

Under Manage Jenkins, the In-process Script Approval screen should always be empty. If you have entries for script approvals, signature approvals, or classpath entry approvals, that means you have jobs that are destabilizing the controller. If you are an administrator of a Jenkins controller and people are asking you for approvals, request that the person rewrites what they are trying to do, to eliminate the need for the approval.

Be careful when interacting with Jenkins APIs from a Pipeline

  • You need to be very careful when interacting with Jenkins APIs from a Pipeline to avoid severe security and performance issues.

  • If you must use Jenkins APIs in your build, the recommended approach is to create a minimal plugin in Java that implements a safe wrapper around the Jenkins API you want to access using the Pipeline’s Step API. Using Jenkins APIs from a sandboxed Jenkinsfile directly means that you have probably had to allow methods that allow sandbox protections to be bypassed by anyone who can modify a Pipeline, which is a significant security risk. It may also lead to the allow method being run as the System user, having overall admin permissions thereby increasing the possibility of developers possessing higher permissions than intended.

  • Using Jenkins.instance or its accessor methods in a Pipeline or shared library indicates a code misuse within that Pipeline or shared library.

  • Using Jenkins APIs from an unsandboxed shared library means that the shared library is both a shared library and a kind of Jenkins plugin.

  • Implement a Jenkins plugin that is able to gather the data needed.

Use @NonCPS only if necessary

Asynchronous Pipeline steps (such as sh and sleep) are always CPS-transformed, and may not be used inside of a method annotated with @NonCPS.

Pipeline code is continuation-passing style (CPS)-transformed so that Pipelines are able to resume after a Jenkins restart. This allows you to shut down Jenkins or lose connectivity to an agent while the Pipeline is running your script. When Jenkins comes back, it remembers what it was doing and your Pipeline script resumes execution as if it were never interrupted. However, some Groovy expressions do not work correctly as a result of CPS transformation. See Pipeline CPS method mismatches for more details and some examples of things that may be problematic.

If required, use the @NonCPS annotation to disable the CPS transformation for a specific method. If you do this the Groovy function will have to restart completely since it is not transformed.