Maintaining CloudBees Analytics server data

The CloudBees Analytics server uses the Elasticsearch search engine and the Logstash data-collection and log-parsing engine to gather data from the CloudBees CD/RO server for use in the Deployments, Releases, and Release Command Center dashboards.

Backing up CloudBees Analytics Server Elasticsearch data

You should back up your existing CloudBees Analytics server data frequently. We recommend full regular (nightly) backups and a backup before an upgrade. For further details on archiving and restoring Elasticsearch indices, see https://www.elastic.co.

You should consider the following points for the CloudBees Analytics server when you set up the Elasticsearch snapshot repository:

  • When you register the location of the shared file system repository in the path.repo setting in the elasticsearch.yml file, you must specify the setting in the Custom Settings section to ensure that it is preserved during upgrades.

    Following is an example for Linux platforms:

    path.repo: ["/home/ecloud/bb", "/mount/backups", "/mount/longterm_backups"]

    Following is an example for a remote shared folder location on Windows platforms using a Windows UNC path:

    path.repo: ["\\\\\\Snapshots"]
  • Because the CloudBees Analytics server is configured with SSL authentication, the curl command format must be as follows:

    curl -k –X  -E /conf/reporting/elasticsearch/admin.crtfull.pem --key /conf/reporting/elasticsearch/admin.key.pem \https://:/

    For example:

    curl -k –X POST -E /opt/ef/conf/reporting/elasticsearch/admin.crtfull.pem --key /opt/ef/conf/reporting/elasticsearch/admin.key.pem \https://localhost:Elasticsearch-port/_snapshot/my_backup/snapshot_1/_restore
  • The Elasticsearch indices created by CloudBees CD/RO through the CloudBees Analytics server begin with ef- so they can be selected using the ef-* index pattern.

  • Most Elasticsearch indices follow a time-based index naming scheme and use -yyyy as the suffix for the index name, where yyyy is the year associated with the document.

    For example, all deployments for the year 2018 will be stored in the index named ef-deployment-2018. This time-based naming scheme can be used in your archiving strategy for the CloudBees Analytics server.

Removing Old CloudBees Analytics Elasticsearch Data

CloudBees Analytics provides insight and visibility into not just your ongoing releases and deployments, but also historic releases. So you must retain old data in the CloudBees Analytics server.

You can provide sufficient disk space for the CloudBees Analytics server based on its the usage requirements in Disk Usage . However, if you must remove very old data from the CloudBees Analytics server to reclaim disk space, follow the recommendations explained below.

Ensuring Sufficient Disk Space for Storing CloudBees Analytics Data

Make sure that enough disk space is provided for storing CloudBees Analytics data for the last n years based on your data retention requirements. For details about calculating disk usage requirements for the CloudBees Analytics server based on your data-generation patterns, see Disk Usage .

Removing the Old Data

Elasticsearch is the underlying analytics store for the CloudBees Analytics server. The CloudBees Analytics server data is stored as indices in Elasticsearch. If you must remove old data, you should use Elasticsearch Curator to delete old indices. For more information about Elasticsearch Curator, see https://www.elastic.co/guide/en/elasticsearch/client/curator/5.7/index.html .

  • Install Elasticsearch Curator on the system where the CloudBees Analytics server is installed.

    The curator CLIs curator_cli and curator use a configuration file that contains Elasticsearch connection settings.

    Following is a sample YAML configuration file that you can use for connecting to an Elasticsearch cluster or instance that is backing the CloudBees Analytics server:

    client:
      hosts:
        - 127.0.0.1
      port: Elasticsearch_port
      use_ssl: True
      certificate: data_dir/conf/reporting/elasticsearch/chain-ca.pem
      client_cert: data_dir/conf/reporting/elasticsearch/admin.crtfull.pem
      client_key: data_dir/conf/reporting/elasticsearch/admin.key.pem
      ssl_no_validate: False
      http_auth:
      timeout: 30
      master_only: False

    where Elasticsearch port is the Elasticsearch port number and data_dir is the CloudBees Analytics server data directory path.

  • Run the following command to verify that you can connect to Elasticsearch using the configuration file:

    curator_cli --config curator-config.yml show_indices

    The Elasticsearch indices created by CloudBees CD/RO begin with ef-. Most of the CloudBees CD/RO indices follow a time-based index naming scheme and use -yyyy as the suffix for the index name, where yyyy is the year associated with the record. For example, all deployments for the year 2018 are stored in the index named ef-deployment-2018.

    Following is a sample YAML action file to delete CloudBees CD/RO indices older than seven years. You can increase the number of years for which to retain the old indices based on your data retention policies.

    actions:
      1:
        action: delete_indices
        description: >-
          Delete CloudBees Analytics indices older than 7 years
        options:
          ignore_empty_list: True
          timeout_override:
          continue_if_exception: False
          disable_action: False
        filters:
        - filtertype: pattern
          kind: prefix
          value: ef-
        - filtertype: period
          period_type: relative
          source: name
          range_from: -8
          range_to: -7
          timestring: '-%Y'
          unit: years
  • Run the following command to do a dry run using the configuration file and the action file:

    curator --config curator-config.yml --dry-run curator-action.yml

    This shows you the indices that will be deleted but will not actually delete them.

  • Verify the dry run output.

  • Schedule the following curator command to run periodically to delete the old indices based on your YAML action file by entering:

    curator --config curator-config.yml curator-action.yml

Removing Incorrect CloudBees Analytics Elasticsearch Data

If incorrect data is loaded into CloudBees Analytics server, for example, during building or testing of a script meant to send reporting data to the CloudBees Analytics server, you can delete this data using these steps:

  • Identify the Elasticsearch index from which incorrect data needs to be deleted.

    CloudBees Analytics server indices are named using the pattern ef-report-object-name-yyyy. So assuming that you used the sendReportingData API to send the data to the CloudBees Analytics server, and the report object name was test, then the corresponding index name would be ef-test-2019.

  • Back up the index before deleting any data in case something goes wrong and you need to restore the data.

    • Log in to the system running the CloudBees Analytics server.

    • Open a terminal window and change directories to the CloudBees Analytics server conf/ directory.

      On Linux, the default path is

      /opt/electriccloud/electriccommander/conf/reporting
    • Run the following commands:

      # Create backup index
      curl -vk -XPUT 'https://127.0.0.1:Elasticsearch_port/backup-test' -E elasticsearch/admin.crtfull.pem --key elasticsearch/admin.key.pem
      
      # Copy the data from the original index to the backup index
      curl -XPOST 'https://127.0.0.1:Elasticsearch_port/_reindex?pretty' -E elasticsearch/admin.crtfull.pem --key elasticsearch/admin.key.pem -H 'Content-Type: application/json' -d'
      {
        "source": {
          "index": "ef-test-2019"
        },
        "dest": {
          "index": "backup-test"
        }
      }'
  • Use the Elasticsearch _delete_by_query to API delete the data from the original index based on criteria that uniquely identify the data to be deleted.

    For example, if the data with a field named projectName and value of motorbike needs to be deleted, the following command deletes documents matching the criteria in the index ef-test-2019 :

    curl -vk -XPOST "https://127.0.0.1:Elasticsearch_port/ef-test-2019/_delete_by_query?pretty"
    -H 'Content-Type: application/json' -E elasticsearch/admin.crtfull.pem --key elasticsearch/admin.key.pem -d'
    {
      "query": {
        "term": {
          "projectName": "motorbike-backend"
        }
      }
    }
    '