Maintain CloudBees Analytics server data on Kubernetes

5 minute readReference

CloudBees CD/RO supports automatically creating snapshot backups of CloudBees Analytics OpenSearch indices according to setting in your values file. Snapshots are configured via the values in the analytics.backup fields.

Starting in CloudBees CD/RO v2024.06.0, CloudBees Analytics started using the analytics Kubernetes service with OpenSearch as its search engine. When upgrading from v2024.03.0 or earlier, ensure your values file is updated with the analytics chart. For more information, refer to CloudBees Analytics server values.

For previous CloudBees CD/RO versions using dois with Elasticsearch, refer to the corresponding documentation version for your CloudBees CD/RO.

When backups are enabled, CloudBees CD/RO creates an extra PersistentVolumeClaim (PVC) for storing your snapshots and a cron job to generate them. Snapshots are created every 12 hours and retained for 15 days by default, but the schedule and retention policy can be customized using the fields:

  • analytics.backup.schedule_cron

  • analytics.backup.retentionDays

As described above, a PVC is automatically created to store your snapshots, but a remote repository can also be used to store them if your project uses AWS and GCS buckets. For more information, refer to Create snapshot backups with an external repository.

You can restore snapshots using the instructions in Restoring a snapshot backup. If you need to disable snapshots, refer to the instructions in Disable snapshot restoring.

Limitations

When running more than one OpenSearch backups, all backups must have access to the same directory. Therefore, CloudBees recommends using a shared file system when running CloudBees Analytics in clustered mode.

Configure snapshot backups

To configure a snapshot backup, enable CloudBees Analytics backups in your values file, such as:

analytics: ## Enable or disable creating a backup of cbflow-analytics data. backup: ## NOTE: If you change `analytics.backup.enabled` for an existing installation, ## you must delete the statefulset for the installation prior to upgrading. ## To do so, use `kubectl delete statefulset flow-analytics`. enabled: true schedule_cron: "0 */12 * * *" retentionDays: 15 location: "/os-backups" ## The `imageRepository` in the `images.registry` to pull component image from. imageRepository: cbflow-tools

These updates result in the following behavior:

  • analytics.backup.enabled=true updates CloudBees Analytics statefulsets with backup configurations, including creating an extra PVC with a filesystem as the backend.

  • A Kubernetes CronJob is created. This CronJob:

    • Uses the cbflow-tools docker image to execute backups.

    • Runs periodically, based on frequency set in analytics.backup.schedule_cron.

    • Purges old snapshots according to the retentionDays parameter.

Create snapshot backups with an external repository

CloudBees Analytics can also store snapshots in an off-cluster storage location within a snapshot repository. To use off-cluster storage, you must first register a snapshot repository for the cluster. CloudBees CD/RO supports AWS S3 and Google Cloud Storage (GCS) external repositories for snapshot backups.

To back up snapshots using an external repository, in your values file:

  1. Set the analytics.backup.externalRepo.enabled parameter to true.

  2. Set the analytics.backup.externalRepo.type parameter to gcs or s3.

  3. Set the analytics.backup.externalRepo.bucketName parameter.

  4. Create a secret key using the AWS access key, AWS secret key, or the GCS service account key:

    • For AWS: Create a secret for AWS S3 with bucket policy permissions read/write with keys AWS_ACCESS_KEY and AWS_SECRET_KEY. For example:

      kubectl create secret generic s3awssecret --from-literal=AWS_ACCESS_KEY="XXXXX" --from-literal=AWS_SECRET_KEY="XXXXX"
    • For GCS: Create a secret for GCS with bucket policy permissions read/write and a service account key file with key GCS_SA_KEY. For example:

      kubectl create secret generic gcssasecret --from-file=GCS_SA_KEY=/tmp/gke-credentials.json
  5. Specify created secret:

    1. Configure analytics.backup.externalRepo.existingSecret to the value of your key, or:

    2. Specify keys values in:

      • For AWS S3: analytics.backup.externalRepo.secret.awsAccessKey and analytics.backup.externalRepo.secret.awsSecretKey

      • For GCS: analytics.backup.externalRepo.secret.gcsSaKey

  6. Specify the region where the bucket should be created in analytics.backup.externalRepo.region.

The following is an example backup configuration:

analytics: # Enable or disable creating backup volumes for analytics backup: # enables creating backup volume and deploying backup cron job enabled: true # schedule cron to create snapshot of OpenSearch indices, default to every 12 hrs schedule_cron: "0 */12 * * *" # No of Days snapshots to retain retentionDays: 15 # location where OpenSearch backup volume mounts on / path. Recommended to keep it as it is. location: "/os-backups" # image repository for backup/restore cron jobs. imageRepository: cbflow-tools # restoreSnapshot enable this option to restore latest snapshot from # snapshot repository. restoreSnapshot: false # Name of the snapshot to restore, if restoring an older snapshot. restoreSnapshotName: # enable External Repos Like AWS S3 or AWS GCS externalRepo: enabled: true # type can be s3 or gcs type: s3 bucketName: <Name of AWS S3 Bucket> # base path folder for backups in Bucket basePath: "os-backups" # Either specify the secret where the AWS or GCS credentials stored as per below keys or provide in values file with secret # Create secret for AWS S3 with permission to read/write to bucket policy with Keys AWS_ACCESS_KEY and AWS_SECRET_KEY # e.g kubectl create secret generic s3awssecret --from-literal=AWS_ACCESS_KEY="XXXXX" --from-literal=AWS_SECRET_KEY="XXXXX" # Create secret for GCS with permission to read/write to bucket policy with service account key file with KEY GCS_SA_KEY # e.g kubectl create secret generic gcssasecret --from-file=GCS_SA_KEY=/tmp/gke-credentials.json existingSecret: <Existing-Secret-Name as per above description> secret: # provide only if type s3 and existingSecret not provided awsAccessKey: # provide only if type s3 and existingSecret not provided awsSecretKey: # provide only if type gcs and existingSecret not provided gcsSaKey: existingSecret: <Secret Name> secret: awsAccessKey: awsSecretKey: region: us-east-1

Configure locally hosted PyPI repositories

Starting in CloudBees CD/RO v2024.06.0, you can configure a locally hosted PyPI (Python Package Index) repository in the CloudBees Analytics Helm chart. For systems not connected to the internet, this repository can be used to deploy Python packages for custom backup and monitoring tools.

To use this feature, you must be using CloudBees CD/RO v2024.06.0 or later.

To configure a PyPI repository in your CloudBees Analytics Helm chart:

  1. Open your v2024.06.0 or later values file, and navigate to the Analytics server configuration section.

  2. Navigate to analytics.backup.pipConfig.

  3. Uncomment the following lines, ensuring to maintain valid indentation:

    pipConfig: {} ## pip.conf: | ## [global] ## index-url = http://<private-pypi-repo-host-port> ## trusted-host = <private-pypi-repo-host>
  4. Provide values for:

    • index-url: Specify the URL of your PyPI repository.

    • trusted-host: Specify the domain of the PyPI repository as a trusted host.

      Using the [global] header applies this configuration to all pip commands executed on the CloudBees Analytics server. Setting your PyPI repository as a trusted host prevents pip from issuing SSL warnings about insecure connections when installing packages, especially if you are using HTTP.
  5. Ensure your values file is valid YAML and save.

  6. Update your CloudBees CD/RO deployment to enable the changes.

Restoring a snapshot backup

If your deployment has stored snapshots, you can update your CloudBees Analytics deployment using one. However, the following limitations apply:

  • Snapshot: OpenSearch backups cannot be restored into a cluster running an earlier version. For example, a 2.12.0 snapshot cannot be used to restore a 2.11.0 cluster.

  • Indices: Indices can not be restored into a cluster that is more than one major version newer than the OpenSearch version used to create the snapshot.

To restore a snapshot backup:

  1. Set the analytics.backup.restoreSnapshot parameter to true, and if applicable, provide a restoreSnapshotName:

    analytics: # Enable or disable creating backup volumes for analytics backup: .... # restoreSnapshot enable this option to restore latest snapshot from # snapshot repository. restoreSnapshot: true # Name of Snapshot to restore if need to restore older snapshot restoreSnapshotName: ....
  2. Run your helm install/upgrade command to update the deployment with the snapshot.

After you have restored your snapshot, CloudBees suggests to set analytics.backup.restoreSnapshot back to false to avoid overwriting any data on your next upgrade. For more information, refer to Disable snapshot restoring.

Disable snapshot restoring

To disable a snapshot restore:

  1. Set the analytics.backup.restoreSnapshot parameter to false:

    analytics: # Enable or disable creating backup volumes for analytics backup: .... # restoreSnapshot enable this option to restore latest snapshot from # snapshot repository. restoreSnapshot: false # Name of Snapshot to restore if need to restore older snapshot restoreSnapshotName: ....
  2. The next time you run your helm install/upgrade command, a restore will not be triggered.