Using a Git reference repository

Article ID:115001728812
4 minute readKnowledge base

Issue

  • How do I create a Git reference repository ?

  • How do I configure Git/GitHub SCM to use a reference repository ?

Environment

  • CloudBees Jenkins Enterprise - Managed controller (CJEMM)

  • CloudBees Jenkins Platform - Client controller (CJPCM)

  • CloudBees Jenkins Team (CJT)

  • Jenkins LTS

  • Git plugin

Resolution

The Jenkins Git Plugin can use a reference repository as a cache to reduce remote data transfer and to reduce local disc use. Reference repositories are defined per project via the Additional Behavior named Advanced clone options. Reference repositories are created and maintained manually. The Git plugin does not maintain the reference repositories.

From a user perspective, it takes two steps to configure a reference repository:

  • Create the reference repository

  • Configure your job so that it points to the reference repository

And we can add to this another step:

  • Periodically update the reference repository with the latest content from the original repository

What is a reference repository ?

A reference repository is a local bare repository whose content is used instead of copying from the remote repository. Cloning with a reference repository is much faster because clone creates pointers to the reference repository, instead of copying from the remote repository. This has multiple advantages:

  • it reduces the network I/O and reduces load on the remote Git server by not transferring content which is already in the reference repository

  • it saves local disk space by creating pointers from the project repository to the reference repository instead of creating a new copy

Where to create a reference repository ?

The reference repository needs to be available on the agent cloning the repository. It is most often required on build agents. Although it may be useful on the controller in some cases. Pipeline users will benefit from reference repositories on the controller when performing Branch Indexing (GitHub Branch Source plugin, Bitbucket Branch Source plugin and Gitea plugin) or for Pipeline Shared Libraries (though pipeline shared libraries should really be kept small enough that a reference repository does not matter).

If the reference repository is not available when cloning, the process will fall back to clone from the remote repository.

How to create a reference repository ?

Create a bare git clone of your remote repository using the --mirror option:

git clone --mirror git@github.com:my-user/my-repository.git

This creates a bare repository that contains all refs (branches and tags) of the remote repository. Have a look at the documentation of git-clone for more details.

Advanced configuration for Git submodules

Some users have found that they can further improve performance by caching the history from multiple reference repositories into a single reference repository. For example, if a team uses a git repository that contains multiple submodules, they can create a reference repository which contains the combination of all those remote repositories in a single reference repository.

When such a multiple repository reference is created, the submodules can be references with:

git clone --bare git@github.com:my-user/my-repository.git
cd my-repository.git
git remote add submodule1 git@github.com:my-user/submodule1-repository.git
git remote add submodule2 git@github.com:my-user/submodule2-repository.git

Note about --bare and --mirror

  • --mirror is a better argument choice when there is a single remote repository because it configures the refspec to copy as much from the single remote repository as it can.

  • --bare is the better argument choice where there are multiple remote repositories in the single reference repository because the repository is no longer a "mirror" of the remote repository.

Using --mirror in a multi-repository reference repo could causes toggling of some of the reference repository content within the reference repository. When new content is fetched for the original mirrored repository, it updates references in all remotes. When new content is then fetched from subsequent remote repositories, those updates alter some of the references which had just been updated by the first repository. That toggles some of the content inside the reference repository between the original mirror repository and the subsequent remote repositories.

How to configure a Job to use the Reference repository in Jenkins ?

Configure the Git SCM and add the Additional Behavior of type Advanced clone behaviours. Then specify the location of the reference repository in the "Path of the reference repo to use during clone":

git ref repo config

In Pipeline, the "Pipeline syntax" link on each project (job) page includes the checkout command and will generate the correct syntax for the checkout options you select. A sample subset of a checkout command might be:

checkout([$class: 'GitSCM',
    extensions: [[$class: 'CloneOption', reference: '/var/lib/gitcache/my-repository.git']],
    [...]
])

(Note: if you are configuring a job for which the repository has already been cloned, you will need to remove the workspace to force a new clone on the next build. The new clone will point to the reference repository. To check that the clone performed by the build points to the reference repository, check that the .git directory in the workspace has the file .git/objects/info/alternates which contains the location of the reference repository.)

Maintain your Git repository

You can update the mirror repository from time to time with the following command:

git fetch --all --prune

Resources