Data Retention

More and more enterprises from regulated industries, like financial services and healthcare, have strict auditing guidelines around data retention–for example, the need to maintain records for seven years for compliance purposes. As CloudBees Flow is used, the volume of runtime data accumulates in its databases possibly eroding performance, along with data storage challenges. CloudBees Flow data retention provides a way for the enterprise to manage their data archival and purge needs for runtime objects.

Data retention in CloudBees Flow includes support for these concepts:

  • Support for data archiving—The process of copying data to storage external to the CloudBees Flow server. Available via either UI or API.

  • Support for data purging—The process of deleting data from the CloudBees Flow server. Available via either UI or API.

  • Data retention policies—Data archive and purge criteria, based on object type. Configure via either UI or API.

  • Archive connectors—Specifications of the target archival systems. Configure via API, only.

Key Benefits of Data Retention

CloudBees Flow data retention provides the following key benefits to your organization:

  • Performance and cost benefits—Archiving infrequently accessed data to a secondary storage optimizes application performance. Systematically removing or purging data that is no longer needed from the system helps in improving performance and saves disk space.

  • Regulatory compliance—Enterprises in regulated industries such as the Financial industry are required to retain data for certain lengths of time for regulatory compliance.

  • Internal corporate policy compliance—Organizations may need to retain historical data for audit purposes or to comply with corporate data retention policies

  • Business intelligence and analytics—Organizations want to use archived information in potentially new and unanticipated ways. For example, in retrospective analytics, ML, and more.

Planning Your Data Retention Strategy

  • Decide which objects to include in your retention strategy. Supported objects include:

    • Releases

    • Pipeline runs

    • Deployments

    • Jobs

  • Decide on data retention server settings. See Configuring Data Retention .

    Keeping in mind the amount and frequency of data you wish to process, configure CloudBees Flow server settings to handle the rate.

  • Decide the archive criteria for each object type. See Managing Data Retention Rules .

    • List of projects to which the object can belong.

    • List of completed statuses for the object based on the object type. Active objects cannot be archived.

    • Look-back time frame of completed status.

    • Action: archive only, purge only, purge after archive.

  • Decide on the archive storage system. See Managing Archive Connectors .

    Often the choice of archive storage systems is driven by enterprise’s data retention requirements. For example, regulatory compliance might require a WORM compliant storage that prevents altering data while data analytics might require a different kind of storage that allows for easy and flexible data retrieval. Based on your archival requirements, you can create an archive connector into which the CloudBees Flow data archiving process feeds data.

    These are some of the types of archive storage systems: Cloud-based data archiving solutions—AWS S3, AWS Glacier, Azure Archive Storage, and so on. WORM compliant archive storage—NetApp. Analytics and reporting system—Elasticsearch, Splunk, and so on. Traditional disk-based storage. ** RDBMS and NoSql databases.

Setting Up Data Retention

To set up data retention for your CloudBees Flow server you must perform the following:

Following is the list of CloudBees Flow server settings related to data retention. To access server settings, select Server from Administration in the main menu—the Server page displays. From there, click Settings.

Related settings include:

Setting Name Description

Enable Data Retention Management

When enabled, the data retention management service is run periodically to archive or purge data base on the defined data retention policy.

true—archiving enabled. (default)

false—archiving disabled.

Property name: enableDataRetentionManagement

Type: Boolean

Data Retention Management service frequency in minutes

Controls how often, in minutes, the data retention management service is scheduled run.

Default: 1440 (1 day)

Property name: dataRetentionManagementFrequencyInMinutes

Type: Number of minutes

Data Retention Management batch size

Number of objects to process as a batch for a given data retention policy in one iteration.

Default: 100

Property name: dataRetentionManagementBatchSize

Type: Number

Maximum iterations in a Data Retention Management cycle

Maximum number of iterations in a scheduled data archive and purge cycle.

Default: 10

Property name: maxDataRetentionManagementIterations

Type: Number

Number of minutes after which archived data maybe purged

Number of minutes after which archived data maybe purged if the data retention rule is setup to purge after archiving.

Default: 10080 (7 days)

Property name: purgeArchivedDataAfterMinutes

Type: Number of minutes

Managing Data Retention Rules

Via UI

Click the Main Menu button and select Data Management under the Administration

column. The Data Retention Rules list displays.

  • For new rules: Click the New + button in the upper right corner: the New Data Retention Rule dialog displays.

  • For existing rules: Click the Actions menu for the desired rule and select Details : the Edit Data Retention Rule dialog displays.

Visit tabs in this dialog to specify details of the data detention rule, as follows. When finished with the rule definition, click OK to save the rule.

Details tab

Enter or modify the rule name and optional description, as appropriate.

Rule Definition tab

Define or modify the rule, as appropriate.

  • From the Object column, select the object type. Some object types allow you to choose statuses to include in the archived data set.

  • From the Data column,

    • Configure the age of data to include in a retention operation. For example, consider the rule Older than 45 days : previously unarchived or unpurged data older than 45 days is included in the current operation.

    • Choose Purge or Archive ; for Archive select Purge data after archiving as desired.

    • If object type is Completed Releases, check whether to Include subreleases. This is checked by default.

    • (Optional) From the Filters column, select project names and tags to further filter the archived data set.

More Criteria tab

(Optional) From this tab, you can further refine the data set to include only specific data fields.

Preview tab

A partial list of objects to be archived are listed.

Via API

You can create and manage data retention rules using the API commands through the `ectool ` command-line interface or through a DSL script. For complete details about the API, see "Data Retention" in the CloudBees Flow API Guide.

Managing Archive Connectors

In order to support different kinds of archival systems, the data retention framework provides an extension mechanism to register archive connectors configured for the particular storage system.

Out of the box, CloudBees Flow comes with two sample archive connectors to use as a starting point for your own custom connector:

  • File Archive Connector —configures a directory to use as the archive target.

  • DevOps Insight Server Connector —configures archiving to a report object.

Only one archive connector can be active at a time.

Via UI

Not supported.

Via API

You can create and manage archive connectors using the API commands through the `ectool ` command-line interface or through a DSL script.

Out of the box, CloudBees Flow provides DSL for two archive connectors. Use these as starting points to customize based on your own requirements. When ready to implement your connector, save the DSL script to a file ( `MyArchiveConnector.dsl ` is used below) and run the following on the command line:

ectool evalDSL --dsl MyArchiveConnector.dsl

File Archive Connector

This connector writes data to an absolute archive directory in your file system. Use as is or customize with your own logic, for example, to store data is subdirectories by month or year.

If you customize the logic, update the example DSL and apply it to the CloudBees Flow server by using the following command, where fileConnector.dsl is the name of your customized DSL script:

ectool evalDsl --dslFile fileConnector.dsl

Now, enable it (in either case) with the following command:

ectool modifyArchiveConnector "File Archive Connector"  --actualParameter archiveDirectory="C:/archive"  --enabled true
archiveConnector 'File Archive Connector', {
    enabled = true
    archiveDataFormat = 'JSON'

    // Arguments available to the archive script
    // 1. args.entityName: Entity being archived, e.g., release, job, flowRuntime
    // 2. args.archiveObjectType: Object type defined in the data retention policy,
    //    e.g., release, job, deployment, pipelineRun
    // 3. args.entityUUID: Entity UUID of the entity being archived
    // 4. args.serializedEntity: Serialized form of the entity data to be archived based on
    // the configured archiveDataFormat.
    // 5. args.archiveDataFormat: Data format for the serialized data to be archived
    //
    // The archive script must return a boolean value.
    // true - if the data was archived
    // false - if the data was not archived

    archiveScript = '''
            def archiveDirectory = 'SET_ABSOLUTE_PATH_TO_ARCHIVE_DIRECTORY_LOCATION_HERE'
            def dir = new File(archiveDirectory, args.entityName)
            dir.mkdirs()
            File file = new File(dir, "${args.entityName}-${args.entityUUID}.json")
            // Connectors can choose to handle duplicates if they needs to.
            // This connector implementation will not process a record if the
            // corresponding file already exists.
            if (file.exists()) {
                return false
            } else {
                file << args.serializedEntity
                return true
            }'''
}

DevOps Insight Archive Connector

This connector configures archiving to the DevOps Insight server.

If you customize the logic, update the example DSL and apply it to the CloudBees Flow server by using the following command, where fileConnector.dsl is the name of your customized DSL script:

ectool evalDsl --dslFile fileConnector.dsl

Enable it with the following command:

ectool modifyArchiveConnector "DevOps Insight Server Connector" --enabled true

Apply the DSL script below to create a report object type for each object that can be archived.

// Create the report objects for the archived data before creating the
// archive connector for DevOps Insight server connector.

reportObjectType 'archived-release', displayName: 'Archived Release'
reportObjectType 'archived-job', displayName: 'Archived Job'
reportObjectType 'archived-deployment', displayName: 'Archived Deployment'
reportObjectType 'archived-pipelinerun', displayName: 'Archived Pipeline Run'

This DSL script creates the following report object types:

  • archived-release

  • archived-job

  • archived-deployment

  • archived-pipelinerun

archiveConnector 'DevOps Insight Server Connector', {

    // the archive connector is disabled out-of-the-box
    enabled = true

    archiveDataFormat = 'JSON'

    // Arguments available to the archive script
    // 1. args.entityName: Entity being archived, e.g., release, job, flowRuntime
    // 2. args.archiveObjectType: Object type defined in the data retention policy,
    //    e.g., release, job, deployment, pipelineRun
    // 3. args.entityUUID: Entity UUID of the entity being archived
    // 4. args.serializedEntity: Serialized form of the entity data to be archived based on
    // the configured archiveDataFormat.
    // 5. args.archiveDataFormat: Data format for the serialized data to be archived
    //
    // The archive script must return a boolean value.
    // true - if the data was archived
    // false - if the data was not archived

    archiveScript = '''

            def reportObjectName = "archived-${args.archiveObjectType.toLowerCase()}"

            def payload = args.serializedEntity

            // If de-duplication should be done, then add documentId to the payload
            // args.entityUUID -> documentId. This connector implementation does not
            // do de-duplication. Documents in DOIS may be resolved upon retrieval
            // based on archival date or other custom logic.

            sendReportingData reportObjectTypeName: reportObjectName, payload: payload

            return true
            '''
}

For complete details about the API, see "Data Retention" in the CloudBees Flow API Guide.