How to deal with large Git repositories in Jenkins pipelines

3 minute readKnowledge base

Issue

  • My pipeline project repository contains a lot of history metadata that I don’t need in my builds and is affecting the performance of the repository checkout and increasing the amount of physical memory required.

  • My pipeline project repository contains a lot of binaries that I don’t need in my builds and is affecting the performance of the repository checkout and increasing the amount of physical memory required.

Resolution

Although the remote Git server that we use as our SCM remote in Jenkins can provide all history and large files in our repo, that doesn’t mean that it has to. The remote repository can be configured to only send the requested history and files. There are several strategies that we can follow to reduce the remote work and alleviate the network load:

  • Narrow refspec

  • Shallow clone

  • Large file support

  • Lightweight checkout

  • Repository cache

Narrow refspec

A refspec controls the remote refs (branches or tags) to be retrieved and how they map to local refs. If left blank, it will default to the normal behaviour of git fetch, which retrieves all the branch heads to build as remotes/REPOSITORYNAME/BRANCHNAME.

Using a custom (narrow) refspec allows to reduce the amount of data retrieved, alleviating the network traffic and minimizing the local repository storage needed.

Please note that the Jenkins Git plugin uses a default refspec for its initial fetch, unless the advanced clone options are set to honor refspec.

Syntax of Refspec:

The format of a refspec is generally: +<src>:<dst>, where:

  • + tells Git to update the reference even if it isn’t a fast-forward,

  • <src> specifies the remote ref (e.g., refs/heads/main for the main branch), and

  • <dst> specifies where that ref should be stored locally (e.g., refs/remotes/origin/main).

How I can configure a refspec for my pipeline

In the pipeline configuration site, go to Pipeline  Definition  SCM  Advanced  Refspec and pass your custom refspec.

To enable Honor refspec on initial clone, go to Pipeline  Definition  SCM  Additional Behaviours, add Additional clone behaviours and check that option.

honor-refspec

Shallow clone

Shallow Clone is a Git feature that defines the history retrieval depth, so you don’t need to pull not the entire repo history. So if your project has years of history but you only need the most recent commit, then use a shallow clone with depth 1. A shallow clone with such a small depth should be used only if your build is limited to the current code and requires no interactions with its history.

Additionally, please note that you cannot merge shallow clones, as they (potentially) don’t have a perfect representation of history, and that they are not compatible with JGit.

How I can configure a shallow clone for my pipeline

In the pipeline configuration site, go to Pipeline  Definition  SCM  Additional Behaviours, add Additional clone behaviours and check Shallow clone.

shallow-lone

Large File Support

Most common git implementations have an extension that allows local repositories to store large files outside of them. This is called Large File Support (LFS) and is a good approach to leverage enterprise large file transportation. It is high performant and it reduces the local repository storage.

Using LFS in your Jenkins builds, requires having the extension installed in the build agents and using HTTP(S) authentication. Since version 2.249.2.3, cloudbees-core-agent does contain Git LFS.

How I can configure LFS pulls in my pipeline

In the pipeline configuration site, go to Pipeline  Definition  SCM  Additional Behaviours and add Git LFS pull after checkout.

git-LFS-pull-after-checkout

Lightweight checkout

Lightweight checkout allows to obtain the pipeline Jenkinsfile from the SCM without performing a full checkout. The advantage of this mode is its efficiency.

With a lightweight checkout, initially you will not get any changelogs or polling based on the SCM. However, if you use checkout scm during the build, this will populate the changelog and initialize polling. Declarative pipelines, by default, include an implicit checkout step.

How can I enable Lightweight checkout

In multibranch projects it is enabled by default. In regular pipeline jobs, from the pipeline configuration site, go to Pipeline  Definition  SCM and enable Lightweight checkout.

lightweight-checkout

Additionally, in your declarative pipelines, you can avoid the initial checkout using the declarative skipDefaultCheckout as in the following snippet:

pipeline { agent any options { skipDefaultCheckout true } stages { stage('My custom checkout') { steps { // define you custom checkout } } } }