High Availability (HA) in CloudBees CI on modern cloud platforms requires a storage class with |
The $JENKINS_HOME
directory can contain a huge amount of files (many of them small files), and migrating it to a new storage class can take a lot of time. It’s a very demanding I/O process that can reduce and impact the controller performance and, at some point, you will need to stop and restart the managed controller with the new configuration.
To minimize the outage window during the migration follow these steps:
-
Create a volume using the new storage class.
-
Import a snapshot of the existing volume into the new volume.
-
Stop the controller.
-
Sync the latest changes from the source volume to the new volume.
-
Rename the existing volume claims. The binding between a controller and its volume claim is name based.
-
Update controller configuration to enable HA.
-
Start the controller, backed by the new volume.
Required privileges
The following procedures require a number of privileges in your Kubernetes/OpenShift cluster.
-
Job/{create,delete,get,list}
-
PersistentVolumeClaim/{create,delete,get,list}
-
PersistentVolume/{create,patch,delete,get,list}
Please ensure you have the corresponding privileges before proceeding to the next steps. Also ensure that the RWX storage class has been created and tested with a sample application.
Cleaning up the source controller
Prior to starting the migration, consider the following steps to reduce the migration time.
-
Discard fingerprints if you don’t actively require them
-
Discard old builds or any build that is not required for migration.
-
…
Take a volume snapshot of the source controller
When running on a Cloud provider, most of the time it provides a way to take a snapshot of a live volume backed by a block storage.
Please refer to your cloud provider documentation for explicit details on how to do this.
Create a volume from the latest snapshot
Using your cloud provider tools, you can create a new volume from an existing snapshot. It should be created in the same availability zone as the original volume, to ensure it can be mounted from within the cluster. Write down the volume id somewhere.
Write down the controller domain so that it can be used in the following scripts
export DOMAIN=`<domain>`
kubectl get pv $(kubectl get "pvc/jenkins-home-${DOMAIN}-0" -o go-template={{.spec.volumeName}}) -o yaml > pv-backup-jenkins-home-${DOMAIN}-0.yaml (1)
Edit pv-backup-jenkins-home-$DOMAIN-0.yaml
as follow
-
Remove
.metadata
-
Add
.metadata.name
, set it for example tobackup-jenkins-home-$DOMAIN-0
. -
Remove
.spec.claimRef
-
Remove
.status
-
Edit
.spec
to update the volume id reference to the volume id you noted earlier. The exact field varies depending on the cloud provider. For example when using gce persistent disks, it is.spec.gcePersistentDisk.pdName
.
Here is a sample volume that applies to a GCE disk as reference.
apiVersion: v1 kind: PersistentVolume metadata: labels: failure-domain.beta.kubernetes.io/region: us-east1 failure-domain.beta.kubernetes.io/zone: us-east1-b name: backup-jenkins-home-$DOMAIN-0 spec: accessModes: - ReadWriteOnce capacity: storage: 100Gi gcePersistentDisk: fsType: ext4 pdName: backup-volume nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: failure-domain.beta.kubernetes.io/zone operator: In values: - us-east1-b - key: failure-domain.beta.kubernetes.io/region operator: In values: - us-east1 persistentVolumeReclaimPolicy: Delete storageClassName: my-source-storage-class volumeMode: Filesystem
-
Create the new persistent volume with
kubectl create -f pv-backup-jenkins-home-${DOMAIN}-0.yaml
Create a new persistent volume claim referencing the new persistent volume
kubectl get "pvc/jenkins-home-${DOMAIN}-0" -o yaml > pvc-backup-jenkins-home-${DOMAIN}-0.yaml (1)
Edit pvc-backup-jenkins-home-${DOMAIN}-0.yaml
as follow
-
Remove
.metadata
-
Add
.metadata.name
, set it for example tobackup-jenkins-home-${DOMAIN}-0
-
Edit
.spec.volumeName
to point to the persistent volume name you created just above (backup-jenkins-home-${DOMAIN}-0
unless you changed it to something else) -
Remove
.status
Here is a sample persistent volume claim applying to the previously referenced persistent volume.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: backup-jenkins-home-${DOMAIN}-0 spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: <storage-class> volumeMode: Filesystem volumeName: backup-jenkins-home-${DOMAIN}-0
1 | Replace <storage-class> with the name of the storage class. |
kubectl create -f pvc-backup-jenkins-home-${DOMAIN}-0.yaml
Nowadays, most storage classes use VolumeBindingMode
= WaitForFirstConsumer
which means that to bind the persistent volume claim we need to create a pod using it.
-
Create a pod referencing the PVC you just created
Create allocate-backup-jenkins-home-${DOMAIN}-0.yaml
as follow
cat > allocate-backup-jenkins-home-${DOMAIN}-0.yaml <<EOF apiVersion: batch/v1 kind: Job metadata: name: allocate-backup-jenkins-home-${DOMAIN}-0 spec: template: spec: volumes: - name: volume persistentVolumeClaim: claimName: backup-jenkins-home-${DOMAIN}-0 containers: - name: busybox image: busybox command: ["true"] volumeMounts: - mountPath: /var/volume name: volume resources: limits: cpu: 100m memory: 100Mi requests: cpu: 100m memory: 100Mi restartPolicy: Never backoffLimit: 4 EOF
Create the job using kubectl create -f allocate-backup-jenkins-home-${DOMAIN}-0.yaml
Then wait for the job to be running
kubectl wait --for=condition=complete job/allocate-backup-jenkins-home-${DOMAIN}-0
You can then delete the job.
kubectl delete job/allocate-backup-jenkins-home-${DOMAIN}-0
Create a volume using the new storage class
Create rwx-jenkins-home-${DOMAIN}-0.yaml
as follow
cat > rwx-jenkins-home-${DOMAIN}-0.yaml <<EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rwx-jenkins-home-<domain> spec: storageClassName: <your-new-storage-class> accessModes: - ReadWriteMany resources: requests: storage: 2560Gi # Change this to whatever your storage class requires, or to your needs EOF
We can tweak the previous job to allocate the volume.
Create allocate-rwx-jenkins-home-${DOMAIN}-0.yaml
as follow
cat > allocate-rwx-jenkins-home-${DOMAIN}-0.yaml <<EOF apiVersion: batch/v1 kind: Job metadata: name: allocate-rwx-jenkins-home-<domain> spec: template: spec: volumes: - name: volume persistentVolumeClaim: claimName: rwx-jenkins-home-${DOMAIN}-0 containers: - name: busybox image: busybox command: ["true"] volumeMounts: - mountPath: /var/volume name: volume resources: limits: cpu: 100m memory: 100Mi requests: cpu: 100m memory: 100Mi restartPolicy: Never backoffLimit: 4 EOF
Then wait for the job to be running
kubectl wait --for=condition=complete job/allocate-rwx-jenkins-home-${DOMAIN}-0
Initial sync: synchronize snapshot to new volume
Apply the migration script above to synchronize the backup volume and the new volume.
Create the script pvc-sync.sh
as follow.
#!/bin/bash set -euo pipefail if [ $# -lt 2 ]; then echo "Usage: $0 <source_pvc> <dest_pvc>" echo "Example: $0 backup-source-pvc new-volume-rwx" exit 1 fi source_pvc="${1:?}" dest_pvc="${2:?}" if ! kubectl get "pvc/$source_pvc" -o name > /dev/null 2>&1; then echo "PVC $source_pvc does not exist." exit 1 fi if ! kubectl get "pvc/$dest_pvc" -o name > /dev/null 2>&1; then echo "PVC $dest_pvc does not exist." exit 1 fi if [ "$source_pvc" == "$dest_pvc" ]; then echo "Source and destination PVC must be different." exit 1 fi echo "1. Migration step" kubectl apply -f - <<JOB apiVersion: batch/v1 kind: Job metadata: name: migration spec: template: metadata: annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "false" spec: volumes: - name: volume1 persistentVolumeClaim: claimName: ${source_pvc} - name: volume2 persistentVolumeClaim: claimName: ${dest_pvc} containers: - name: migration image: registry.access.redhat.com/ubi8/ubi command: [sh] args: [-c, "dnf install -y rsync; rsync -avvu --delete /var/volume1/ /var/volume2"] volumeMounts: - mountPath: /var/volume1 name: volume1 - mountPath: /var/volume2 name: volume2 resources: limits: cpu: "2" memory: 4G requests: cpu: "2" memory: 4G restartPolicy: Never backoffLimit: 4 JOB echo "Waiting for migration to complete" echo "You can inspect progress using kubectl logs -f job/migration" kubectl wait --for=condition=complete --timeout=900m job/migration echo "== Data from $source_pvc has been copied over to $dest_pvc" kubectl delete job migration
Make it executable with chmod +x pvc-sync.sh
.
Then run pvc-sync.sh backup-jenkins-home-${DOMAIN}-0 rwx-jenkins-home-${DOMAIN}-0
to start the initial synchronization.
Depending on the volume size, this can take a lot of time (multiple hours).
Once completed, you will need to ensure that ownership of the root directory of the new volume
will match with the expected uid/gid your controller is running as (1000
/1000
on Kubernetes,
will differ on OpenShift, check your actual setup before applying this).
After obtaining a shell,
chown 1000:1000 /var/volume
Rename source volume
Create a script rename_pvc.sh
with the following content
#!/bin/bash set -euo pipefail if [ $# -ne 2 ]; then echo "Usage: $0 <source> <destination>" echo "Example: $0 jenkins-home-mc-0 old-jenkins-home-mc-0" exit 1 fi source_pvc="$1" dest_pvc="$2" if ! kubectl get "pvc/$source_pvc" -o name > /dev/null 2>&1; then echo "PVC $source_pvc does not exist." exit 1 fi if kubectl get "pvc/$dest_pvc" -o name > /dev/null 2>&1; then echo "PVC $dest_pvc already exists. It will be replaced by persistent volume of $source_pvc." read -p "Are you sure? " -n 1 -r # Delete PVC-1, keep PV as backup pv1_name=$(kubectl get "pvc/${dest_pvc}" -o go-template={{.spec.volumeName}}) echo "$pv1_name" > old_pv echo "== ${dest_pvc} points to pv/${pv1_name}" kubectl patch pv ${pv1_name} -p '{"spec": {"persistentVolumeReclaimPolicy": "Retain"}}' echo "Deleting ${dest_pvc}, we keep PV ${pv1_name} around" kubectl delete pvc/${dest_pvc} #kubectl patch pv ${pv1_name} -p '{"spec":{"claimRef": null}}' fi mkdir generated # Rename pvc-2 to pvc-1 # Change PV RetainPolicy to "Retain" pv_name=$(kubectl get "pvc/${source_pvc}" -o go-template={{.spec.volumeName}}) echo "== ${source_pvc} points to pv/${pv_name}" kubectl get "pvc/${source_pvc}" -o yaml > generated/source_pvc.yaml kubectl patch pv ${pv_name} -p '{"spec": {"persistentVolumeReclaimPolicy": "Retain"}}' kubectl delete "pvc/${source_pvc}" kubectl patch pv ${pv_name} -p '{"spec":{"claimRef": null}}' # We ideally want to retain any user annotation cat <<EOF >generated/patch.yaml - op: replace path: /metadata/name value: ${dest_pvc} - op: replace path: /spec/volumeName value: ${pv_name} - op: remove path: /metadata/finalizers - op: remove path: /metadata/creationTimestamp - op: remove path: /metadata/namespace - op: remove path: /metadata/resourceVersion - op: remove path: /metadata/uid - op: remove path: /status EOF cat <<EOF >generated/kustomization.yaml patches: - target: version: v1 kind: PersistentVolumeClaim name: ${source_pvc} path: patch.yaml resources: - source_pvc.yaml EOF trap "rm -rf generated" EXIT kubectl apply -k generated/ pv_name=$(kubectl get pvc/${dest_pvc} -o go-template={{.spec.volumeName}}) echo "== ${dest_pvc} points to pv/${pv_name}" echo "== Resetting ${pv_name} retain policy to Delete" kubectl patch pv ${pv_name} -p '{"spec": {"persistentVolumeReclaimPolicy": "Delete"}}' echo "== You can rename ${dest_pvc} to ${source_pvc} using $0 ${dest_pvc} ${source_pvc}"
Make it executable with chmod +x rename_pvc.sh
.
Then run rename_pvc.sh jenkins-home-$DOMAIN-0 old-jenkins-home-$DOMAIN-0
.
Delta sync: synchronize source volume to new volume
This is the same as the initial sync, except that now the source pvc is not the backup, but the PVC with the live data as it will contain all the latest changes.
This delta sync will take a fraction of the time of the initial sync, however it can still be lengthy depending on the number of files in your filesystem, as rsync will still scan them to determine whether they have changed.
Using the pvc-sync.sh
created previously,
run pvc-sync.sh old-jenkins-home-${DOMAIN}-0 rwx-jenkins-home-${DOMAIN}-0
.
Edit the controller configuration
This can be done while the Delta sync migration is ongoing. |
-
Enable High availability. Instead of a
StatefulSet
, aDeployment
will now be used to manage the controller pods. If you have existing YAML customizations, you will need to adjust them to replaceStatefulSet
byDeployment
. -
Set to 1 replica (can be increased later)
-
Set Storage Class Name to the new storage class name
-
Edit or set YAML to
apiVersion: "apps/v1" kind: Deployment spec: template: spec: securityContext: fsGroupChangePolicy: OnRootMismatch
Rename new volume to the previous name
Rename the new volume to the previous name, so that the controller will be able to mount it.
Run rename_pvc.sh rwx-jenkins-home-$DOMAIN-0 jenkins-home-$DOMAIN-0
.
OUTAGE ENDS HERE - Start the controller on the new volume
In the operations center, start the managed controller.
Remove the previous volume
Once you’ve confirmed that everything is running fine on the new volume, the previous volume as well as its backup volume can be removed.
kubectl delete pvc backup-jenkins-home-$DOMAIN-0 old-jenkins-home-$DOMAIN-0
In case the persistent volume persistentVolumeReclaimPolicy
is not set to Delete
, you may also need to clean up the backup volume afterward.
kubectl delete pv backup-jenkins-home-$DOMAIN-0