Migrate CloudBees Analytics data from Elasticsearch to OpenSearch

9 minute readTroubleshooting

In CloudBees CD/RO v2024.06.0, CloudBees Analytics was upgraded from using Elasticsearch to OpenSearch. The data formats of these search engines are not fully compatible, and to preserve data from your legacy CloudBees Analytics servers, you must migrate it to an updated server. In the following content, these servers are referenced as:

  • Source server: A legacy CloudBees Analytics server using Elasticsearch.

  • Destination server: An updated CloudBees Analytics server using OpenSearch.

Before starting, review Kubernetes migrations overview.

Kubernetes migrations overview

The following information provides an overview for Kubernetes migrations:

  • Before migrating your data, back up the legacy CloudBees Analytics data. For more information on this process, refer to Maintain CloudBees Analytics server data on Kubernetes.

    Failing to back up the legacy CloudBees Analytics data could result in permanent data loss if issues arise during data migration.

  • Starting in v2024.06.0, seperate CloudBees Analytics services for flow-devopsinsight, using Elasticsearch, and flow-analytics, using OpenSearch, are included.

    Although both CloudBees Analytics services are included, only flow-analytics can communicate with other CloudBees CD/RO services.
  • Before performing Migrate data with CloudBees CD/RO procedure, ensure you have copied any custom setting from the dois chart to the analytics chart and deployed them to your environment.

    Ensure an authentication method is configured for analytics. For more information, refer to Update CloudBees Analytics authentication methods.
  • If you have not already done so, in your updated values file, change analytics.autoRegister: true to analytics.autoRegister: false.

    Setting analytics.autoRegister: false in your values file prevents the CloudBees Analytics server configuration from being created on the CloudBees Software Delivery Automation server. This is critical to prevent unexpected issues while migrating CloudBees Analytics data from Elasticsearch to OpenSearch.

    After you have migrated your data from flow-devopsinsight to flow-analytics, and completed Disable legacy server after migration, follow the instruction in Enable the CloudBees Analytics configuration after migration to configure the CloudBees Analytics server for CloudBees Software Delivery Automation.

  • The following is an explanation of using Migrate data with CloudBees CD/RO procedure for Kubernetes migrations:

    • Using this procedure transfers your data between the two services using their URL endpoints. By default, the URL endpoints are:

      • Source URL: https://flow-devopsinsight.<namespace>:9200

      • Destination URL: https://flow-analytics.<namespace>:9201

    • For both CloudBees Analytics instances, select Runtime credential.

      • For the Runtime credential, use reportuser as the username for both CloudBees Analytics instances.

      • To retrieve the reportuser password:

        To get the <secret-name> for the following commands, for each, run:

        kubectl get secrets --namespace <namespace>
        • For the source server (flow-devopsinsight) password, run:

          kubectl get secret --namespace <namespace> <dois-secret-name> -o jsonpath="{.data.CBF_DOIS_PASSWORD}" | base64 --decode; echo
        • For the destination server (flow-analytics) password, run:

          kubectl get secret --namespace <namespace> <analytics-secret-name> -o jsonpath="{.data.CBF_ANALYTICS_PASSWORD}" | base64 --decode; echo
  • (OPTIONAL) To avoid resource overhead, CloudBees recommends disabling the legacy dois service after you complete migration and confirm your updated CloudBees Analytics server is operating as expected. For more information, refer to Disable legacy server after migration.

Migrate data with CloudBees CD/RO procedure

The EC-Utilities project comes with a Reindex Analytics Data procedure that can be used to migrate your data from Elasticsearch to OpenSearch. This procedure copies data from the source server to the destination server.

To migrate your CloudBees Analytics from Elasticsearch to OpenSearch using the Reindex Analytics Data procedure:

Before running the Reindex Analytics Data procedure:

  • Ensure the CloudBees Analytics flow-devopsinsight and flow-analytics services are both running in your deployment.

  1. In CloudBees CD/RO, navigate to DevOps essentials  Procedures.

  2. For the project, in the filtering options, change from All projects to EC-Utilities.

  3. Select Run icon for the Reindex Analytics Data procedure, and New run.

  4. Provide the following data:

    The section Kubernetes migrations overview provides specific information on the values required for the Reindex Analytics Data procedure parameters.
    Table 1. Reindex Analytics Data procedure description
    Procedure Parameter Description

    Source URL

    Specify the URL for the data source in the format:

    <protocol>://<hostname>:<portnumber>

    Source Credential

    Specify the username and password on the data source server for reindexing:

    Destination URL

    Specify the URL for the destination server in the format:

    <protocol>://<hostname>:<portnumber>

    Destination Credential

    Specify the username and password on the data recipient server for reindexing:

    Allow Mismatched Indices

    This setting controls the behavior if indices on the source and destination have the same name:

    • If selected, indices with the same name are automatically handled as described in Handling of indexes with the same name.

    • If unselected, and indices with the same name are encountered during migration, the procedure terminates with an error.

    Debug

    Specify the verbosity level of debug messages.

  5. Select OK to start the migration.

After starting the procedure with the parameters, the provided parameters will be checked and the data copying process will start. The jobstep log shows the progress and result of this copying. For more information, refer to Data migration log example.

Data migration log example

When running the CloudBees CD/RO procedure the log will be similar to:

CloudBees Analytics example data migration log
Checking available indices from the source server... [ 1/ 22] Checking the index 'ef-build-2020' ... [ 2/ 22] Checking the index 'ef-build-2021' ... ... [ 21/ 22] Checking the index 'ef-pipelinerun-2023' ... [ 22/ 22] Checking the index 'ef-release' ... The source server contains 22 indices with 260,000 documents. [ 1/ 22] Transferring the index 'ef-build-2020' with 2 documents... Done 2 documents in 882 msecs. Created: 2; Updated: 0; Deleted: 0; Batches: 1; Conflicts: 0; Noops: 0 Verifying the index 'ef-build-2020' in the destination server... The resulting index 'ef-build-2020' on the destination server contains 2 documents. [ 2/ 22] Transferring the index 'ef-build-2021' with 21 documents... Done 21 documents in 549 msecs. Created: 21; Updated: 0; Deleted: 0; Batches: 1; Conflicts: 0; Noops: 0 Verifying the index 'ef-build-2021' in the destination server... The resulting index 'ef-build-2021' on the destination server contains 21 documents. ..... [ 21/ 22] Transferring the index 'ef-pipelinerun-2023' with 50,061 documents... Done 50,061 documents in 13 secs 609 msecs. Created: 50,061; Updated: 0; Deleted: 0; Batches: 51; Conflicts: 0; Noops: 0 Verifying the index 'ef-pipelinerun-2023' in the destination server... The resulting index 'ef-pipelinerun-2023' on the destination server contains 50,061 documents. [ 22/ 22] Transferring the index 'ef-release' with 66 documents... Done 66 documents in 340 msecs. Created: 66; Updated: 0; Deleted: 0; Batches: 1; Conflicts: 0; Noops: 0 Verifying the index 'ef-release' in the destination server... The resulting index 'ef-release' on the destination server contains 66 documents. Reindexing has been successfully completed. Processed 22 indices and 260,000 documents in 1 min 33 secs.

To view the data migration log:

In the first step, the Reindex Wizard checks available indices on the source server, and displays basic statistics:

Checking available indices from the source server... [ 1/ 22] Checking the index 'ef-build-2020' ... [ 2/ 22] Checking the index 'ef-build-2021' ... ... [ 21/ 22] Checking the index 'ef-pipelinerun-2023' ... [ 22/ 22] Checking the index 'ef-release' ...

In the second stage, each index is copied individually from the source to destination server. The progress is shown similar to:

[ 1/ 22] Transferring the index 'ef-build-2020' with 2 documents... Done 2 documents in 882 msecs. Created: 2; Updated: 0; Deleted: 0; Batches: 1; Conflicts: 0; Noops: 0 Verifying the index 'ef-build-2020' in the destination server... The resulting index 'ef-build-2020' on the destination server contains 2 documents.

The third step outputs the result of migration, which includes the total number of transferred indices and documents. This is similar to:

Reindexing has been successfully completed. Processed 22 indices and 260,000 documents in 1 min 33 secs.

In this example, 22 indices with 260,000 documents were detected on the source server and migrated to the designation server.

Disable legacy server after migration

Both the flow-devopsinsight and flow-analytics services must be running to complete Migrate data with CloudBees CD/RO procedure. However, after you have completed the procedure, and confirmed the updated flow-analytics service is operating as expected, flow-devopsinsight is no longer required.

Although it is optional, to avoid resource overhead, CloudBees recommends disabling the legacy dois load to stop the flow-devopsinsight service. To do so, either:

If you have not yet created a backup of your dois data, CloudBees strongly suggests to do so prior to completing the following actions. Failing to do so, could result in permanent data loss. For more information on creating a backup, refer to Maintain CloudBees Analytics server data on Kubernetes.

  • From the command line, rerun your helm upgrade command and include --set dois.enabled=false.

    On your next helm upgrade, this is overwritten if dois.enabled: false has not been updated in your values file.
  • In your values files, set dois.enabled: false, and rerun your helm upgrade command.

You have now disabled the legacy flow-devopsinsight service. Next, follow the instruction in Enable the CloudBees Analytics configuration after migration.

Enable the CloudBees Analytics configuration after migration

After migrating your data from flow-devopsinsight to flow-analytics, and completing Disable legacy server after migration, you must update your deployment with the CloudBees Analytics server configuration for the CloudBees Software Delivery Automation server. To do so, either:

  • From the command line, rerun your helm upgrade command and include --set analytics.autoRegister=true.

    On your next helm upgrade, this is overwritten if analytics.autoRegister: true has not been updated in your values file.
  • In your values files, set analytics.autoRegister: true, and rerun your helm upgrade command.

Your CloudBees Software Delivery Automation server is now configured with the CloudBees Analytics server configuration.

Known issues for data migration

This section provides information about known issues you may encounter while migrating data from Elasticsearch to OpenSearch.

Increased disk space requirements for data migration

During the migration from Elasticsearch to OpenSearch, disk space requirements may need to be increased. This is caused by the simultaneous existence of indexes for both the legacy and updated CloudBees Analytics instances.

This issue typically only applies to:

  • Traditional migrations where the migration occurs on the same machine.

  • Kubernetes migrations

To roughly calculate the space needed during migration, CloudBees has provided a utility. To use this utility:

  1. Navigate to the CloudBees examples repository.

  2. Download the reporting-data-reindex.pl utility.

  3. Follow the instructions in the README.md.

  4. Based on the value returned for Indices Size totaled for all nodes, double the disk space.

    • Example: If the total returned for all nodes was 20GB, then an additional 20GB is required only for the migration.

      After the migration is completed, you can return the disk space to the desired level.

Timeouts reached when migrating large indexes

The migration options provided by CloudBees have a timeout of 180 minutes per index to avoid unexpected hangs. In cases where an index contains a considerably large amount of data, and its migration does not complete within the timeout duration, the migration process fails.

This may result in having to split such indexes into multiple smaller indexes. If you encounter multiple timeout issues, contact CloudBees support.

To calculate the size of indexes, CloudBees has provided a utility. To use this utility:

  1. Navigate to the CloudBees examples repository.

  2. Download the reporting-data-reindex.pl utility.

  3. Follow the instructions in the README.md.

Handling of indexes with the same name

During reindexing, there are several scenarios that can occur:

  1. An index is copied from the source server, and no index on the destination server has the same name. A new index is then created on the destination server with the same settings as the source server, and the data is copied to it.

  2. An index is copied from the source server, and an index on the destination server has the same name with the same settings. The index on the destination server is then updated to include any new data from the source server.

  3. An index is copied from the source server, and an index on the destination server has the same name, but with different settings. When this occurs:

    1. The existing index destination server is backed up as a new index with a new name using the scheme ef-reindex_backup-<timestamp>-<index name>.

    2. A new index is created with the name and settings from the source server.

    3. An entry for each such event appears in the job log, similar to:

      [ 6/ 22] Transferring the index 'ef-defect-2021' with 175 documents... Properties with mismatched types were found in the destination index. This index will be saved under a different name. Renaming the existing index 'ef-defect-2021' on the destination server to the new name 'ef-reindex_backup-20240510130805-defect-2021'... Done 175 documents in 186 msecs. Created: 138; Updated: 37; Deleted: 0; Batches: 1; Conflicts: 0; Noops: 0 Verifying the index 'ef-defect-2021' in the destination server... The resulting index 'ef-defect-2021' on the destination server contains 138 documents.

In this case:

  1. Both the source and destination server have an index named ef-defect-2021.

  2. Different settings are detected for the index on each server.

  3. The existing ef-defect-2021 index on the destination server is backed up as:

    ef-reindex_backup-20240510130805-defect-2021

  4. A new ef-defect-2021 index is created on the destination server with the data and settings from the source server.

Handling of removed or deprecated Elasticsearch query syntax

With the upgrade from Elasticsearch to OpenSearch, query DSL changes for all default CloudBees reports that used Elasticsearch query DSL syntax are handled automatically. This includes the following changes:

  • Replacing the deprecated field [inline] with [source] in the [script] section.

  • Replacing the deprecated field [interval] with [calendar_interval] in the [date_histogram] section.

  • Replacing the deprecated order key [_term] with [_key] in the aggregation section.

  • Replacing ["field": "_type"] with ["script", "_doc"], because the "_type" field was removed.

The changes described above are also automatically handled within custom reports.

However, for custom reports that use other query constructs, this upgrade may create breaking changes caused by deprecated or removed ElasticSearch fields. As a result, such queries must be updated with DSL syntax that is compatible with OpenSearch. For more information, refer to the OpenSearch Query DSL documentation.

For any other breaking changes that may impact your custom reports, refer to the OpenSearch v2.14 breaking changes documentation.