Cluster operations

12 minute read

Cluster operations is a facility to perform maintenance operations on various items in operations center, such as client controllers and update centers. Different operations are applicable to various items such as performing backups or restarts on client controllers, or upgrading or installing plugins in update centers.

The main way of running these operations is either via a custom job type, or some preset operations embedded at different locations in the CloudBees CI UI.

Cluster Operations jobs

You create a Cluster Operations job in the same way as you would any other job in CloudBees CI, by selecting New Item in the view you want to create it in, giving it a name, and selecting Cluster Operations as the item type.


A Cluster Operations job can contain one or more operations that are executed in sequence one after the other when the project runs.

An operation has:

  • A type that it can operate on, for example, a client controller or Update Center.

  • A set of target items that it will operate on that is obtained from a selected source and reduced by a set of configured filters. The target items will be operated on in parallel, and the max number of parallel items can be configured in the Advanced Operation Options section.

  • A list of steps to perform in sequence on each target item.

The available sources, filters, and steps depend on the target type that the operation supports.


To create a cluster operation:

To run the operation successfully, the user requires the RUN_SCRIPT and ADMINISTER permissions on the client controller. The RUN_SCRIPT permission is required for the Groovy step to work, and the ADMINISTER permission is required for the prepare for shutdown step.

  1. On the root level or within a folder of operations center, select New Item.

  2. Specify a name for the cluster operation, for example "Quiet down all controllers".

  3. Select Cluster Operations as the item type.

    You will then be directed to the configuration screen for the newly created job.

  4. Select Add Operation > Controllers to add a new operation with the client controller target type.

    Figure 1. Creating a new cluster operation
    Figure 1. Creating a new cluster operation
  5. Under Source, select From Operations Center Root.

  6. Select Add Filter Client Controller / Managed Controller Is Online.

    This will select all client controllers in operations center that are online when the operation is run.

    We have now specified what to run the operation on and next we will specify what to run on them by adding two steps.

  7. Select Add Step Execute Groovy Script on Controller and enter the following code:

    System.out.println("==QUIET-DOWN@" + new Date());

    This will print the text and the current date and time to the log on the CloudBees CI controller which can be handy for audit later on.

  8. Select Add Step Prepare controller for shutdown.

    This step performs functions similar to what you would get if you selected Prepare for Shutdown on the Manage Jenkins page on each controller.

    Your configuration should look something like the following when you’re done:

    Figure 2. Creating a new cluster operation
    Figure 2. Creating a new cluster operation
  9. Select Save.

On starting, this cluster operation runs each client controller in parallel, and the standard notice "Jenkins is going to shut down" is displayed on each client controller.

Controlling how to fail

Sometimes it’s desirable to modify how a failure affects the rest of the operation flow. On the configuration screen for the cluster operation job, in each Operation section, there is an Advanced button. Selecting it reveals some advanced control functions like max number of controllers to run in parallel, but also Failure Mode and Fail As.

  • Fail As: This option is a way to set the CloudBees CI build result that the run will get. Build result options include Failure, Abort, Unstable, and so on.

  • Failure Mode: This option controls what happens to the rest of the run if an operation step on an item fails.

    • Fail Immediately: This option will abort anything in progress and fail immediately.

    • Fail Tidy: This option waits for anything currently running to finish, and then fail. (All operations in the queue are cancelled.)

    • Fail At The End: This option will let everything run to the end, and then fail.

Ad-hoc manual cluster operations

operations center comes with a couple of preset cluster operations that can be run on selected client controllers directly from the side panel of a list view or client controller page. The list of preset cluster operations is located under Manage Jenkins > Cluster Operations.

Running from a list view

Cluster operations provides a new list view column type called ClusterOp Item Selector and appears by default as the right column on new list views and the All view.

Figure 3. Ad-hoc cluster operation
Figure 3. Ad-hoc cluster operation

For preexisting list views before cluster operations, you’d need to add the column by editing the view. As with all list views (except the All view), you can customize the columns to change the order they are displayed.

Mark the client controllers that you want to run the operation on by selecting the appropriate checkbox in the Op column; the selection on each view will be remembered throughout your session.

Select Cluster Operations, in the left pane, to open the context menu that contains the available operations for the view.

select run adhoc
Figure 4. Ad-hoc cluster operation

You can get to the project page of the preset operation, if you are an administrator, by selecting the gear icon next to the operation name.

Selecting the operation’s name, either via the context menu or the separate list page, takes you to the run page, where you can run the operation, or specify the parameters for this operation. The run page also contains the list of selected client controllers, with those not applicable for this run shown with a strike-out through their names, and a simple explanation of why they are not applicable. The client controllers can be available for a particular run either because a given controller is the wrong type for the operation or because a configured filter removed it from the resource pool. Some operations, for example, are only designed to run on online controllers, so any offline controllers will be filtered out.

ad hoc run now
Figure 5. Ad-hoc cluster operation
The list of controllers and whether operations will run on them or not are part of a preliminary display; the list is recalculated once an operation actually runs. The status of the client controllers (online or offline) might change between the display and when the operation is run.

Running from a client controller manage page

The process to run an operation from a client controller is similar to the one described in Running from a list view. The only difference is that as you are only operating on a single client controller, no selection on a list view is involved.

ad hoc from client master
Figure 6. Ad-hoc cluster operation

Operation run results and logs

Each run of a cluster operation job is accessible from the project page in the left panel like any normal CloudBees CI job. On the run page you can see the operations that were executed, the items (client controllers or update centers) that they ran on, and the result in the form of a colored ball (success/failure) as well as a link to the log files for each run.

Console Output in the left panel shows the overall console log for all operations. To see the individual console output of each operation on a client controller or update center you can go via the log link next to each item on the run page or via a link for each in the overall console output.

Changing Domain Name and Enabling SSL

Initial Domain Names or IPs After Installation

After the CloudBees CI cluster is running take note of the generated domain name or IP addresses, depending on the infrastructure where it is running.


On Amazon AWS the generated domain name or IP addresses will look similar to the following:

Workers    :,

Operations Center:

The name of the ELB is the section after http://cjoc. (example: and that is the name the DNS records have to point to.


In OpenStack, you can take the IP addresses for the controllers and create one or more A records for them instead of CNAME records.

Workers    :,

CJOC    :
Mesos   :

The controller IPs in this example would be,, and

Change Domain Name and/or Enable SSL

When the domain-name-change operation is carried out, it is assumed that the new domain name is operational and is being served by the local DNS server. Please work with your operations department to create a new domain.

If enabling SSL, it is assumed that the certificates are valid for the machine running the installation.

The DNS domain can be changed with the operation: domain-name-change

$ {cli} prepare domain-name-change
domain-name-change is staged - review domain-name-change.config and edit as needed - then run '{CLI} apply' to perform the operation.

Edit the domain-name-change.config file to specify the new domain name and related parameters. For more information about the configuration parameters see Domain Options and SSL Configuration sections. Then apply the operation with cje apply.

The DNS records need to be setup before executing cje apply.
This operation requires downtime because several applications will be reconfigured and restarted.

Domain options guide

You want to run your CloudBees CI service under your own domain. CloudBees CI uses URL names both for user-facing instances, operations center, and for internal use.

To do this, you need to have access to your own domain name to the DNS settings.

The following options allow you to customize the URLs that will be used to access your cluster services. Depending on what your network administrator lets you do on your network, you may be in one of the following situations.

The target for the DNS records can be:

  • A domain name, in providers such as AWS where an Elastic Load Balancer (ELB) is created, for example, In this case a CNAME record needs to be created.

  • One or more IPs, in other providers such as OpenStack, for example,,, In this case one or more A records need to be created.

You can create subdomains beyond 1 level

All the cluster services will be exposed as subdomains of the domain name you provided.

These snippets need to be tuned to fit your own domain name.

To use the domain, you would use

cluster-init.config excerpt
  domain_name =
  domain_separator = .
DNS record - AWS
cje           IN CNAME  <name of the ELB>.
mesos.cje     IN CNAME  <name of the ELB>.
marathon.cje  IN CNAME  <name of the ELB>.
DNS record - OpenStack
mesos.cje         IN A <IP of the lbaas instance>.
marathon.cje      IN A <IP of the lbaas instance>.
cje               IN A <IP of the lbaas instance>.

Then your cluster will be available with the following URLs:




A controller named controller-1 will be available as

You cannot create subdomains beyond 1 level

Infrastructure services will be registered using the provided domain as a suffix.

These snippets need to be tuned to fit your own domain name.

For domain, you would use

cluster-init.config excerpt
  domain_name =
  domain_separator = -
DNS records on AWS
  mesos-cje         IN CNAME  <name of the ELB>.
  marathon-cje      IN CNAME  <name of the ELB>.
  cje               IN CNAME  <name of the ELB>.
DNS records on OpenStack
  mesos-cje         IN A <IP of the lbaas instance>.
  marathon-cje      IN A <IP of the lbaas instance>.
  cje               IN A <IP of the lbaas instance>.

Then your cluster will be available with the following URLs:




A controller named controller-1 will be available as

SSL configuration

SSL termination can be configured at controller level by configuring the NGINX proxy server with the SSL certificates or in AWS at the ELB level.

It is configured by setting protocol = https and one of the following options.

Controller termination

Set router_ssl = yes and provide key and certificate files as nginx.key and nginx.cert respectively in the project directory.

AWS ELB termination

SSL certificates will need to be configured in EC2 and provided via ssl_certificate_id using Amazon Resource Names (ARN) syntax.

CloudBees CI requires a certificate with multiple names :,, for the base domain

AWS IAM certificate example
ssl_certificate_id = arn:aws:iam::123456789012:certificate/some-certificate-name
AWS ACM certificate example
ssl_certificate_id = arn:aws:acm:us-east-1:123456789012:certificate/12345678-aaaa-bbbb-cccc-012345678901

If it is not possible to provide a certificate with multiple names, it is possible to provide multiple certificates. The additional certificates can be set up using ssl_certificate_id_mesos and ssl_certificate_id_marathon options using the AWS ARN syntax.

SSL certificate example (three certificates with one name each)
ssl_certificate_id = arn:aws:acm:us-east-1:123456789012:certificate/12345678-aaaa-bbbb-cccc-012345678901
ssl_certificate_id_mesos = arn:aws:acm:us-east-1:123456789012:certificate/12345678-aaaa-bbbb-cccc-012345678902
ssl_certificate_id_marathon = arn:aws:acm:us-east-1:123456789012:certificate/12345678-aaaa-bbbb-cccc-012345678903
Restart controller

In the event of a controller failure, the controller can be replaced with the 'controller-restart' operation.

$ {cli} prepare controller-restart
controller-restart is staged - review controller-restart.config and edit as needed - then run '{CLI} apply' to perform the operation.

Edit the controller-restart.config file and enter the controller name as the [server] name.

Then carry out the operation with cje apply.

This operation will terminate the specified controller and restart a new one. To avoid loss of data, perform this operation only on a multi-controller setup.

Restore a cluster

If an entire cluster fails or you need to re-create a destroyed cluster, you can use the operation 'cluster-recover' to recover the cluster as long as you still have the PROJECT directory.

If you do NOT have the PROJECT directory anymore, you will have to re-create a new cluster following the standard initialization steps. If using EBS storage on AWS, use the same cluster_name to recover operations center and controllers data.
$ {cli} prepare cluster-recover
cluster-recover is staged - review cluster-recover.config and edit as needed - then run '{cli} apply' to perform the operation.

Edit the cluster-recover.config file before executing the operation to specify the configuration directory path to recover.


## Cluster configuration directory path to recover
# path relative to the PROJECT directory
  • In the case of a cluster failure, the configuration directory path (dna_path) will be the default .dna hidden path.

  • In the case of a recovery of a destroyed cluster, specify the destroyed dna path. The destroyed path is usually a hidden path like .dna-destroyed-DATESTAMP. You can list the hidden path with ls -alt command.

  • If the recovery is done from a different machine or bastion host, the access to the administrative ports need to be updated via the tiger_admin_port_access parameter.

Then apply the operation with cje apply command.

Migrate an entire EC2 cluster

When using AWS/EC2, you should use the EBS service for persistence:

storage_server = ebs://

This means that all long-term state information is stored in EBS volumes and snapshots. When you run cje destroy, these volumes and snapshots are left in place.

You can use the EBS persistence feature to migrate between Amazon regions. You can run cje destroy in one region, and then tell Amazon (through the console, or CLI) to copy all snapshots to your new region. You can then run the cje cluster-init operation with cluster-init.config changed to reflect the new region you want to run your cluster in (note that you will have to follow the initialization steps to setup DNS records).

Updating access from selected IPs

Both the admin_port_access parameter, controlling the admin access from selected IPs, and the user_port_access parameter, controlling the user access from selected IPs, can be updated with the access-port-update operation.

$ {cli} prepare access-port-update
access-port-update is staged - review access-port-update.config and edit as needed - then run '{CLI} apply' to perform the operation.

Both access port parameters can contain one or more IP address ranges in CIDR notation separated by commas (for example,,

Use to allow connections from any IP, or IP/32 (for example to allow access from a single IP (for example: Other CIDR network masks can be used to control wider ranges of IPs.

Then carry out the operation with cje apply.

This operation on access port parameters only applies if your product is not already using network information that you supplied during initial configuration.

Updating operations center parameters

To update operations center parameters, use the cjoc-update operation.

$ {cli} prepare cjoc-update
cjoc-update is staged - review cjoc-update.config and edit as needed - then run '{CLI} apply' to perform the operation.

Edit the cjoc-update.config file to define the parameters you want to change. Only uncomment the parameters you want to change.

This operation allows to: (see cjoc-update.config file for a complete list of parameters)

  • Enable/disable the evaluation mode

  • Set the operations center container memory

  • Set the operations center application JVM options

  • Set the operations center workspace disk size

  • Set a custom operations center docker image

Enabling/updating EC2 Container Registry (ECR) configuration

To update the ECR configuration, use the ecr-update operation.

$ {cli} prepare ecr-update
ecr-update is staged - review ecr-update.config and edit as needed - then run '{CLI} apply' to perform the operation.

Edit the ecr-update.config file to define the parameters you want to change.

This operation allows to: (see ecr-update.config file for a complete list of parameters)

  • Enable usage of the default AWS EC2 Container Registry

  • Enable AWS EC2 Container Registry for specific accounts

Scripting cluster operations

CloudBees CI "operations" commands have a lifecycle. To execute an operation, you first need to stage/prepare the command. This stage lays down a configuration file that contains the input parameters of the operation.

By default the config file is edited by the admin user. To facilitate the scripting of cli operations, there are two ways to specify operations values via the cje prepare command:

  1. Config/secrets file arguments

  2. Operation parameter arguments

Note that both type of arguments can be used together if necessary. In this case, the parameter value arguments will overwrite the values specified in the config file. Also, only config file parameters can be specified as arguments. If the OPERATION requires secrets, a secrets file can be specified with the --secrets-file SECRETS-FILE argument.

Config/Secrets file arguments

To use a specific config file and/or secrets file for the operation, use the following options of the cje prepare command. See cje prepare -h for all options

  • --config-file CONFIG_FILE

    • cje prepare --config-file CONFIG_FILE OPERATION

  • --secrets-file SECRETS-FILE for operations that require secrets inputs

    • cje prepare --secrets-file SECRETS-FILE OPERATION

Operation parameter arguments

Operation parameters can also be specified as arguments to the cje prepare command. Use cje prepare OPERATION -h to get the list of parameters available for the specified OPERATION.

For example for the worker-add (Add worker(s) to the cluster) operation, the arguments available are:

$ {cli} prepare worker-add -h
usage: tiger prepare [-h] [-p DIR] [--config-file CONFIG-FILE]
                     [--secrets-file SECRETS-FILE]
                     [--aws.worker_instance_type VAL]
                     [--aws.worker_volume_size VAL]
                     [--worker.count VAL]

Prepares an operation.

Prepared operations must be configured and then applied using the apply command.

  worker-add            Prepare worker-add.

optional arguments:
  -h, --help            show this help message and exit
  -p DIR, --project DIR
                        Directory containing the project files
  --config-file CONFIG-FILE
                        Use the specified config file
  --secrets-file SECRETS-FILE
                        Use the specified secrets file
  --aws.worker_instance_type VAL
                        The instance type of the worker to create.
  --aws.worker_volume_size VAL
                        The instance root volume size
  --worker.count VAL    Number of workers to add

For example, to add 2 workers using a m4.xlarge instance type on AWS, you can specify the worker count and the instance type as arguments:

$ {cli} prepare worker-add --worker.count 2 --aws.worker_instance_type m4.xlarge
worker-add is staged - review worker-add.config and edit as needed - then run 'cje apply' to perform the operation.
If cje apply fails, ensure that the machine has valid name servers configured. Running bees-pse apply without a valid name server configuration, or without the host(1) utility installed, will cause the script to fail when resolving addresses.

The worker-add.config file would be pre-populated with the values specified as arguments and ready for cje apply.

worker-add.config file content after the prepare with arguments command:


## Number of workers to add
count = 2


## The instance type of the worker to create.
# Leave empty to use default value
worker_instance_type = m4.xlarge

## The instance root volume size
# worker_volume_size =