Estimating Database Growth

4 minute read

Here is one way to estimate the rate at which the database grows for Change Tracking:

Most of the information needed to estimate the database growth rate can also be parsed from an exported XML file of the project (such as a full import), except for the version count.

The likely rate of database growth is based on these factors:

  • How many objects of different types will be tracked

  • How big they are

  • How often their state changes

The rate of database consumption is then the sum over all different types of tracked objects of the product of these three factors.

Common Usage Patterns

For most customer usage patterns, the most common objects (the ones that dominate on first factor) are ACLs and their ACEs, properties, or all of these objects. Typically:

  • ACLs and ACEs: ACLs are consistently small, and usually have only a few ACEs.

  • Properties: While most of them are small, some properties can be very large. The average property size is large.

For most users, a good quick estimate can be based on properties, especially properties that are large or are modified often.

Estimating Process

If you expect to enable Change Tracking for all projects, determine the values for these factors and calculate the database growth rate:

Database growth rate = Factor 1 x Factor 2 x Factor 3

  • Factor 1: How many objects of different types will be tracked

    The following procedure is only for properties:

    1. Join the ec_property table in the database joined to the ec_property_sheet table by ec_property.parent_sheet_id = ec_property_sheet.id.

    2. Search for properties where the ec_property_sheet.entity_type value is one of the entity types corresponding to a database table that has a matching ec_*_aud table.

    3. Determine the number of tracked objects per property type.

      For most usage patterns, the great majority of properties will belong to entity types like job and jobStep that are not change tracked and so have no corresponding ec_*_aud table.

      If you are already using CloudBees Flow 5.3 or later, these properties and properties belonging to projects for which Change Tracking is disabled are also recognizable by having ec_property_tracked set to false.

  • Factor 2: How big they are (size)

    The following procedure is only for properties.

    For small properties:

    1. Search for objects that are less than 450 bytes in the ec_property.string column.

    2. Add an overhead value of half a kilobyte per property for the other columns.

    3. Determine the average size of the tracked objects.

      For larger properties:

    4. Search for the length of the value in the ec_clob table linked to by the ec_property.clob column.

    5. Add an overhead value of one kilobyte per property for the other columns.

    6. Determine the average size of the tracked objects.

  • Factor 3: How often new copies of the objects are generated

    The following procedure is only for properties:

    1. Use one of these methods:

      • Determine ratio of how long ago the ec_property.created and ec_property.modified datetimes are , which gives an estimate concentrating on only the most recent changes.

        and/or

      • Determine the value of the ec_property.version counter divided how long ago the ec_property.created datetime is, which gives a long-term average rate of changes to the property since it was created.

    2. Determine the average update frequency.

For properties that are frequently updated always to be a numerical value, it is possible to suppress Change Tracking of numerical-value updates by using one of these commands: ectool: ectool modifyProperty <projectName> — path <propertyPath> — counter true

or

ec-perl: $cmdr→ modifyProperty (<projectName>, {path ⇒ <propertyPath>, counter ⇒ true});

When the counter flag is set for a property, Change Tracking does not track changes to the property if the only change was a change in the property value from one numeric value to another. All other forms of changes to the properties that have this flag set are tracked normally.

Reverting an object that owns a counter property to a previous state will revert the value of the counter property to its value at the previous time that a change to this property was tracked–typically this is when its value is initialized. This may not be the desired behavior, so you may need to manually set a counter property if it is reverted.

Alternative Estimating Methods

Depending on your usage patterns, one of these methods may result in a more accurate estimate than the previous method of determining the average factor values and multiplying them

Split the set of properties of change tracked objects into different populations

For each object, perform the previous method of determining the average factor values and multiplying them per tracked object.

Add the values for each tracked objects together.

Example:

  • If you have one set of properties (such as those in one project, or belonging to one user) containing very large properties that are never changed, and another set containing very small properties that change often, multiplying the average property size by the average rate of property update would result in an overestimate of the total database consumption.

  • If you put the properties in separate projects and calculate the database growth rate for each project, the database growth rates are more accurate.

Other columns that may be of interest include ec_property.owner (who created the property) and ec_property.last_modified_by usernames (who last modified the property).