Maintain CloudBees Analytics server data

The CloudBees Analytics server uses the OpenSearch search engine to gather and store data from the CloudBees CD/RO server for use in the Deployments, Releases, and Release Command Center dashboards.

Backing up CloudBees Analytics server OpenSearch data

IPv6 addresses are only supported for Kubernetes platforms. If using an IPv6 address, enclose the address in square brackets. Example: [<IPv6-ADDRESS>].

You should back up your existing CloudBees Analytics server data frequently. CloudBees recommends full regular (nightly) backups and a backup before an upgrade. For further details on archiving and restoring OpenSearch indices, refer to Availability and recovery.

You should consider the following points for the CloudBees Analytics server when you set up the OpenSearch snapshot repository:

When you register the location of the shared file system repository in the path.repo setting in the opensearch.yml file, you must specify the setting in the Custom Settings section to ensure that it is preserved during upgrades.

Following is an example for Linux platforms:
```
path.repo: ["/home/ecloud/bb", "/mount/backups", "/mount/longterm_backups"]▼
```
Following is an example for a remote shared folder location on Windows platforms using a Windows UNC path:
```
path.repo: ["\\\\<MY_SERVER>\\Snapshots"]▼
```

Because the CloudBees Analytics server is configured with SSL authentication, the curl command format must be as follows:

curl -k –X <POST|PUT> \
  -E <data_dir>/conf/analytics/admin.crtfull.pem \
  --key <data_dir>/conf/analytics/admin.key.pem \
  https://<Analytics server-host-name>:<port number>/<request-URI>▼

For example:

curl -k –X POST \
  -E /opt/ef/conf/analytics/admin.crtfull.pem \
  --key /opt/ef/conf/analytics/admin.key.pem \
  https://localhost:9201/_snapshot/my_backup/snapshot_1/_restore▼

The OpenSearch indices created by CloudBees CD/RO through the CloudBees Analytics server begin with ef- so they can be selected using the ef-* index pattern.
Most CloudBees Analytics indices follow a time-based index naming scheme and use -yyyy as the suffix for the index name, where yyyy is the year associated with the document.

For example, all deployments for the year 2018 will be stored in the index named ef-deployment-2018. This time-based naming scheme can be used in your archiving strategy for the CloudBees Analytics server.

Removing old CloudBees Analytics OpenSearch data

CloudBees Analytics provides insight and visibility into not just your ongoing releases and deployments, but also historic releases. So you must retain old data in the CloudBees Analytics server.

You can provide sufficient disk space for the CloudBees Analytics server based on its the usage requirements in Disk usage. However, if you must remove very old data from the CloudBees Analytics server to reclaim disk space, follow the recommendations explained below.

Ensuring sufficient disk space for storing CloudBees Analytics data

Make sure that enough disk space is provided for storing CloudBees Analytics data for the last n years based on your data retention requirements. For details about calculating disk usage requirements for the CloudBees Analytics server based on your data-generation patterns, refer to Disk usage.

Removing the old data

OpenSearch is the underlying analytics store for the CloudBees Analytics server. The CloudBees Analytics server data is stored as indices in OpenSearch. If you must remove old data, use Curator for OpenSearch to delete old indices. For more information about OpenSearch Curator, refer to the curator-opensearch.

Install Curator for OpenSearch on the system where the CloudBees Analytics server is installed.

The curator CLIs curator_cli and curator use a configuration file that contains OpenSearch connection settings.

Following is a sample YAML configuration file that you can use for connecting to an OpenSearch cluster or instance that is backing the CloudBees Analytics server:
```
client:
  hosts:
    - 127.0.0.1
  port: OpenSearch_port
  use_ssl: True
  certificate: data_dir/conf/analytics/chain-ca.pem
  client_cert: data_dir/conf/analytics/admin.crtfull.pem
  client_key: data_dir/conf/analytics/admin.key.pem
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False▼
```
where OpenSearch port is the OpenSearch port number and data_dir is the CloudBees Analytics server data directory path.
Run the following command to verify that you can connect to OpenSearch using the configuration file:
```
curator_cli --config curator-config.yml show-indices▼
```
The OpenSearch indices created by CloudBees CD/RO begin with ef-. Most of the CloudBees CD/RO indices follow a time-based index naming scheme and use -yyyy as the suffix for the index name, where yyyy is the year associated with the record. For example, all deployments for the year 2018 are stored in the index named ef-deployment-2018.

Following is a sample YAML action file to delete CloudBees CD/RO indices older than seven years. You can increase the number of years for which to retain the old indices based on your data retention policies.
```
actions:
  1:
    action: delete_indices
    description: >-
      Delete CloudBees Analytics indices older than 7 years
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: ef-
    - filtertype: period
      period_type: relative
      source: name
      range_from: -8
      range_to: -7
      timestring: '-%Y'
      unit: years▼
```
Run the following command to do a dry run using the configuration file and the action file:
```
curator --config curator-config.yml --dry-run curator-action.yml▼
```
This shows you the indices that will be deleted but will not actually delete them.
Verify the dry run output.
Schedule the following curator command to run periodically to delete the old indices based on your YAML action file by entering:
```
curator --config curator-config.yml curator-action.yml▼
```

Removing incorrect CloudBees Analytics OpenSearch data

If incorrect data is loaded into CloudBees Analytics server, for example, during building or testing of a script meant to send reporting data to the CloudBees Analytics server, you can delete this data using these steps:

Identify the OpenSearch index from which incorrect data needs to be deleted.

CloudBees Analytics server indices are named using the pattern ef-report-object-name-yyyy. So assuming that you used the sendReportingData API to send the data to the CloudBees Analytics server, and the report object name was test, then the corresponding index name would be ef-test-2019.

Back up the index before deleting any data in case you need to restore the data.

Log in to the system running the CloudBees Analytics server.
Open a terminal window and change directories to the CloudBees Analytics server conf/ directory.

On Linux, the default path is
```
/opt/cloudbees/sda/conf/analytics▼
```

Run the following commands:

# Create backup index
curl -vk -XPUT 'https://127.0.0.1:OpenSearch_port/backup-test' \
    -E admin.crtfull.pem \
    --key admin.key.pem

# Copy the data from the original index to the backup index
curl -XPOST 'https://127.0.0.1:OpenSearch_port/_reindex?pretty' \
    -E admin.crtfull.pem \
    --key admin.key.pem \
    -H 'Content-Type: application/json' -d '
{
  "source": {
    "index": "ef-test-2019"
  },
  "dest": {
    "index": "backup-test"
  }
}'▼

Use the OpenSearch _delete_by_query to API delete the data from the original index based on criteria that uniquely identify the data to be deleted.

For example, if the data with a field named projectName and value of motorbike needs to be deleted, the following command deletes documents matching the criteria in the index ef-test-2019 :
```
curl -vk -XPOST \
  "https://127.0.0.1:OpenSearch_port/ef-test-2019/_delete_by_query?pretty" \
  -H 'Content-Type: application/json' \
  -E admin.crtfull.pem \
  --key admin.key.pem -d '
{
  "query": {
    "term": {
      "projectName": "motorbike"
    }
  }
}
'▼
```