Maintain CloudBees Analytics server data on Kubernetes

CloudBees CD/RO supports automatically creating snapshot backups of CloudBees Analytics OpenSearch indices according to setting in your values file. Snapshots are configured via the values in the analytics.backup fields.

Starting in CloudBees CD/RO v2024.06.0, CloudBees Analytics started using the analytics Kubernetes service with OpenSearch as its search engine. When upgrading from v2024.03.0 or earlier, ensure your values file is updated with the analytics chart. For more information, refer to CloudBees Analytics server values.

For previous CloudBees CD/RO versions using dois with Elasticsearch, refer to the corresponding documentation version for your CloudBees CD/RO.

When backups are enabled, CloudBees CD/RO creates an extra PersistentVolumeClaim (PVC) for storing your snapshots and a cron job to generate them. Snapshots are created every 12 hours and retained for 15 days by default, but the schedule and retention policy can be customized using the fields:

analytics.backup.schedule_cron
analytics.backup.retentionDays

As described above, a PVC is automatically created to store your snapshots, but a remote repository can also be used to store them if your project uses AWS and GCS buckets. For more information, refer to Create snapshot backups with an external repository.

You can restore snapshots using the instructions in Restoring a snapshot backup. If you need to disable snapshots, refer to the instructions in Disable snapshot restoring.

Limitations

When running more than one OpenSearch backups, all backups must have access to the same directory. Therefore, CloudBees recommends using a shared file system when running CloudBees Analytics in clustered mode.

Configure snapshot backups

To configure a snapshot backup, enable CloudBees Analytics backups in your values file, such as:

analytics:
  ## Enable or disable creating a backup of cbflow-analytics data.
  backup:
    ## NOTE: If you change `analytics.backup.enabled` for an existing installation,
    ## you must delete the statefulset for the installation prior to upgrading.
    ## To do so, use `kubectl delete statefulset flow-analytics`.
    enabled: true
    schedule_cron: "0 */12 * * *"
    retentionDays: 15
    location: "/os-backups"
    ## The `imageRepository` in the `images.registry` to pull component image from.
    imageRepository: cbflow-tools▼

These updates result in the following behavior:

analytics.backup.enabled=true updates CloudBees Analytics statefulsets with backup configurations, including creating an extra PVC with a filesystem as the backend.
A Kubernetes CronJob is created. This CronJob:
- Uses the cbflow-tools Docker image to execute backups.
- Runs periodically, based on frequency set in analytics.backup.schedule_cron.
- Purges old snapshots according to the retentionDays parameter.

Create snapshot backups with an external repository

CloudBees Analytics can also store snapshots in an off-cluster storage location within a snapshot repository. To use off-cluster storage, you must first register a snapshot repository for the cluster. CloudBees CD/RO supports AWS S3, Google Cloud Storage (GCS), and S3-compatible external repositories such as Cloudian, MinIO, and OpenIO for snapshot backups.

To back up snapshots using an external repository, in your values file:

Set the analytics.backup.externalRepo.enabled parameter to true.
Set the analytics.backup.externalRepo.type parameter to s3, gcs, cloudian, minio, or openio.
Set the analytics.backup.externalRepo.bucketName parameter.
Create a secret key using the AWS access key, AWS secret key, or the GCS service account key:
- For AWS: Create a secret for AWS S3 with bucket policy permissions read/write with keys AWS_ACCESS_KEY and AWS_SECRET_KEY. For example:
  kubectl create secret generic s3awssecret --from-literal=AWS_ACCESS_KEY="XXXXX" --from-literal=AWS_SECRET_KEY="XXXXX"
  ▼
- For GCS: Create a secret for GCS with bucket policy permissions read/write and a service account key file with key GCS_SA_KEY. For example:
  kubectl create secret generic gcssasecret --from-file=GCS_SA_KEY=/tmp/gke-credentials.json
  ▼
- For Cloudian, MinIO, or OpenIO: Create a secret with access credentials using keys accessKey and secretKey. For example:
  kubectl create secret generic s3compatsecret --from-literal=accessKey="XXXXX" --from-literal=secretKey="XXXXX"
  ▼
Specify created secret:
1. Configure analytics.backup.externalRepo.existingSecret to the value of your key, or:
2. Specify keys values in:
  - For AWS S3: analytics.backup.externalRepo.secret.awsAccessKey and analytics.backup.externalRepo.secret.awsSecretKey
  - For GCS: analytics.backup.externalRepo.secret.gcsSaKey
  - For Cloudian/MinIO/OpenIO: analytics.backup.externalRepo.secret.accessKey and analytics.backup.externalRepo.secret.secretKey.
Specify the region where the bucket should be created in analytics.backup.externalRepo.region.
For Cloudian, MinIO, or OpenIO, also set:
- endpoint: The storage service URL.
- endpointProtocol: Either http or https.
- pathStyleAccess: Usually set to true for compatibility.

The following is a backup configuration example:

Backup configuration example

analytics:
  backup:
    enabled: true
    schedule_cron: "0 */12 * * *"
    retentionDays: 15
    location: "/os-backups"
    imageRepository: cbflow-tools
    restoreSnapshot: false
    restoreSnapshotName:

    externalRepo:
      enabled: true
      # type can be s3, gcs, cloudian, minio, or openio
      type: cloudian
      bucketName: <Name of Bucket>
      basePath: "os-backups"

      # Required for S3-compatible types
      endpoint: https://cloudian.example.com
      endpointProtocol: https
      pathStyleAccess: true

      existingSecret: <Your-Secret-Name>
      secret:
        # For AWS S3
        awsAccessKey:
        awsSecretKey:
        # For GCS
        gcsSaKey:
        # For S3-compatible (cloudian, minio, openio)
        accessKey:
        secretKey:

      region: us-east-1▼

Configure locally hosted PyPI repositories

Starting in CloudBees CD/RO v2024.06.0, you can configure a locally hosted PyPI (Python Package Index) repository in the CloudBees Analytics Helm chart. For systems not connected to the internet, this repository can be used to deploy Python packages for custom backup and monitoring tools.

To use this feature, you must be using CloudBees CD/RO v2024.06.0 or later.

To configure a PyPI repository in your CloudBees Analytics Helm chart:

Open your v2024.06.0 or later values file, and navigate to the Analytics server configuration section.
Navigate to analytics.backup.pipConfig.

Uncomment the following lines, ensuring to maintain valid indentation:

    pipConfig: {}
    ## pip.conf: |
    ##  [global]
    ##  index-url = http://<private-pypi-repo-host-port>
    ##  trusted-host = <private-pypi-repo-host>▼

Provide values for:

index-url: Specify the URL of your PyPI repository.

trusted-host: Specify the domain of the PyPI repository as a trusted host.

Using the [global] header applies this configuration to all pip commands executed on the CloudBees Analytics server. Setting your PyPI repository as a trusted host prevents pip from issuing SSL warnings about insecure connections when installing packages, especially if you are using HTTP.

Ensure your values file is valid YAML and save.
Update your CloudBees CD/RO deployment to enable the changes.

Configure CloudBees Analytics backups using GKE Workload Identity

Using GKE Workload Identity allows your Kubernetes service account to impersonate your GCP service account. This enables you to pass snapshot backups of CloudBees Analytics to GCP buckets without including GCP service account credentials in CloudBees Analytics configuration files.

Prerequisites

The following prerequisites must be met to enable GKE Workload Identities:

You must be using CloudBees CD/RO v2023.08.0 or later, which includes Elasticsearch 7.17.10.
Your CloudBees CD/RO cluster must be running on GKE.
Several commands in the following steps use the gcloud CLI. If you do not have it installed refer to Google’s Install the gcloud CLI documentation, or perform these steps using the Google Cloud Console.
Several command in the following steps use the gsutil CLI. If you do not have it installed refer to Google’s Install gsutil documentation, or perform these steps using the Google Cloud Console.

Configure CloudBees Analytics backups

To enable CloudBees Analytics to send snapshot backups to your GCP buckets:

Update the cluster where you are running CloudBees Analytics on GKE to use Workload Identity, as described in the GKE Use Workload Identity documentation.

In your existing CloudBees CD/RO namespace, create a Kubernetes service account:

# Create a Kubernetes service account in the
# K8s namespace running CloudBees Analytics.
# Replace <K8s-service-account> with a K8s service account name.
# Replace <cbflow-namespace> with the namespace where you have cbflow installed.

kubectl create serviceaccount <K8s-service-account> -n <cbflow-namespace>▼

Create a GCP bucket using either the Google Cloud Console or with the following gsutil command:

# Create a GCP bucket.
# Replace <CloudBees-Analytics-bucket-name> with
# a name for the CloudBees Analytics bucket.

gsutil mb gs://<CloudBees-Analytics-bucket-name>▼

Create a Google Cloud service account using the Google Cloud Console or with the following gcloud command:

# Create a GCP service account.
# Replace <GCP_SERVICE_ACCOUNT> with your GCP account name.
# Replace <PROJECT_ID> with your GCP project ID.
# Replace <K8s-service-account> with a K8s service account name.
# Replace <cbflow-namespace> with the namespace where you have cbflow installed.

gcloud iam service-accounts add-iam-policy-binding <GCP_SERVICE_ACCOUNT>@<PROJECT_ID>.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:<PROJECT_ID>.svc.id.goog[<cbflow-namespace>/<K8s-service-account>]"▼

<GCP_SERVICE_ACCOUNT> and <PROJECT_ID> must be the same used in Update existing cluster to use Workload Identity.

Add the iam.gke.io/gcp-service-account annotation to your Kubernetes service account

# Replace <GCP_SERVICE_ACCOUNT> with your GCP account name.
# Replace <PROJECT_ID> with your GCP project ID.
# Replace <K8s-service-account> with a K8s service account name.
# Replace <cbflow-namespace> with the namespace where you have cbflow installed.

kubectl annotate serviceaccount <K8s-service-account> \
--namespace <cbflow-namespace> \
iam.gke.io/gcp-service-account=<GCP_SERVICE_ACCOUNT>@<PROJECT_ID>.iam.gserviceaccount.com▼

In your cb-flow myvalues.yaml, add the following GCP information:

analytics:
  nodeSelector:
    iam.gke.io/gke-metadata-server-enabled: "true"
  backup:
    enabled: true
    externalRepo:
      enabled: true
      serviceAccountsIdentity: true
      type: gcs
      # Replace <CloudBees-Analytics-bucket-name>
      # with the bucket name you created.
      bucketName: <CloudBees-Analytics-bucket-name>
      # Replace <your-GCP-region> with GCP region of your cluster.
      region: <your-GCP-region>▼

Update the CloudBees CD/RO installation to apply the changes from your myvalues.yaml. For example:

# Replace <cbflow-release-name> with the
# release name where you have cbflow installed.
# Replace <cbflow-namespace> with the namespace
# where you have cbflow installed.
# Replace <myvalues.yaml> with the path to
# the cb-flow values file where you added GCP information.

helm upgrade <cbflow-release-name> cloudbees/cloudbees-flow \
      -n <cbflow-namespace> \
      -f <myvalues.yaml> \
      --timeout 10000s▼

This is only an example helm upgrade command. Your installation may require many more directives to install correctly.

Restoring a snapshot backup

If your deployment has stored snapshots, you can update your CloudBees Analytics deployment using one. However, the following limitations apply:

Snapshot: OpenSearch backups cannot be restored into a cluster running an earlier version. For example, a 2.12.0 snapshot cannot be used to restore a 2.11.0 cluster.
Indices: Indices can not be restored into a cluster that is more than one major version newer than the OpenSearch version used to create the snapshot.

To restore a snapshot backup:

Set the analytics.backup.restoreSnapshot parameter to true, and if applicable, provide a restoreSnapshotName:

analytics:
  # Enable or disable creating backup volumes for analytics
  backup:
    ....
    # restoreSnapshot enable this option to restore latest snapshot from
    # snapshot repository.
    restoreSnapshot: true
    # Name of Snapshot to restore, if needed to restore older snapshot
    restoreSnapshotName:
    ....▼

Run your helm install/upgrade command to update the deployment with the snapshot.

After you have restored your snapshot, CloudBees suggests to set analytics.backup.restoreSnapshot back to false to avoid overwriting any data on your next upgrade. For more information, refer to Disable snapshot restoring.

Disable snapshot restoring

To disable a snapshot restore:

Set the analytics.backup.restoreSnapshot parameter to false:

analytics:
  # Enable or disable creating backup volumes for analytics
  backup:
    ....
    # restoreSnapshot enable this option to restore latest snapshot from
    # snapshot repository.
    restoreSnapshot: false
    # Name of Snapshot to restore, if needed to restore older snapshot
    restoreSnapshotName:
    ....▼

The next time you run your helm install/upgrade command, a restore will not be triggered.