Using CloudBees Flow in Your Environment

This topic addresses some common issues you may encounter as you start using CloudBees Flow and offers some tips about managing your CloudBees Flow projects.

What’s in a step?

One of the first questions you may ask when you start to implement your build and test processes on CloudBees Flow is how to divide processes into individual steps: "Should I use only a few steps, each of which does a lot, or a large number of fine-grain steps?" Here are some factors to help you decide how many steps to use:

Reporting —If you would like a separate report of success/failure for two activities, put those activities in separate steps. If you are happy to have a single report for both activities, a single step may make sense. For example, compilation and test phases should probably be in different steps because errors in the two phases will probably be handled differently. Unit tests for product components managed by different groups may make sense in separate steps; each group can watch for errors in its step.
Parallelism —If you want to use CloudBees Flow to run two activities in parallel, put them in separate steps. If these activities are in the same step, CloudBees Flow cannot run them in parallel.

CloudBees Flow parallelism works best at a coarse grain, such as running different sets of unit tests in parallel or compiling for different platforms. CloudBees does not recommend trying to do fine-grain parallelism with CloudBees Flow, such as compiling every individual source file in a separate step— this is likely to be complicated and brittle, resulting in an enormous number of job steps, which will make it difficult to view results. If you would like to use fine-grain parallelism for compilation, we recommend using our ElectricAccelerator product. In this case, you would make your compilation steps large, with as much work in each makefile as possible because ElectricAccelerator automatically subdivides the work and runs as many sub-steps as possible in parallel. The more work in a makefile, the more efficient the process becomes.

Resources —If two different activities need to run on different resources, they need to be in different steps. A single step runs entirely on a single resource.
Conditional steps —If you want to skip portions of your process during some jobs, but execute them during others, it probably makes sense to put those activities in separate steps and use the CloudBees Flow "Run condition" mechanism to decide whether they run during each particular job. However, if you are invoking programs like make that already allow you to choose which actions to perform, it may make more sense to have a single large step and handle conditional behavior with those programs, rather than with CloudBees Flow. For example, when running make, you can specify the targets to be built.
Setup —You might want to have a single step at the beginning of a procedure that processes the procedure’s parameters and sets up the environment for the rest of the procedure. The setup step will create a snapshot of your source code or set up a ClearCase view also. After setup, the remaining steps should be easier to write because they just use information created during the setup step.

Where’s the script for a step?

Typically, each step executes a script of some type, which is processed by a command language such as cmd on Windows, a shell on UNIX, or Perl.

Two ways to handle script execution

The first approach—Enter the script directly into the command field for the step. You can specify the language interpreter as the shell for the step.
The second approach—Store the script in a file that is part of the source code for your project, then specify a simple command for the step that invokes the command language to process the script file.

We recommend using the second approach in most cases. The main advantage of this approach is that it makes your CloudBees Flow procedures more robust. For example, suppose the script needs to change as your product evolves. If you have kept the script outside CloudBees Flow with the source code for the product, each product version can have its own copy of the script. When you extract the source code for the product at the beginning of a job, you also extract the scripts for its steps. You can change a step’s script without worrying about its impact on other versions of the product.

However, if the script for a step is stored in the step, it is more difficult to evolve the script with new product versions. You probably still need to build older product versions, so you will worry whether the new script for the step will work with older product versions. If the script does not work, perhaps you can modify the step’s script to test the version being built and take different actions for each version—this process becomes increasingly more complex as the number of versions increases. Or, you can make a separate copy of the build procedure for each product version (more on this below), but this process also gets complicated as you acquire more and more procedure versions. We find it is easier to store scripts for steps with the product code, so script changes are handled in the same way as changes to the product.

Two cases where it makes sense to store the script for a step in the CloudBees Flow step

The first case is for the first step of a job. At this point you have not extracted the source code for the product, so you do not yet have access to any scripts stored with the source code.
The second case is for steps with one or two commands only, or steps already specified primarily by information stored with your source code. For example, if a step is running a make or ant command, step behavior is already specified almost entirely by a makefile or a build.xml file for Ant. In this case, the step’s script invokes a single, relatively simple command; no need to store this in a file. If you subsequently find that the script for a step is changing frequently, consider moving it out of CloudBees Flow and into a file stored with your source code.

The object is to make your CloudBees Flow procedures as reusable as possible, so no change is needed with every small change to your product. Even better, organize your CloudBees Flow procedures so a single procedure can be used for multiple products. To do this, store product-or version-specific information with the product, not with CloudBees Flow.

Environment variables

Your build and test scripts probably depend on certain environment variables having certain values. In your existing system, you probably set those values at the beginning of your script. However, in CloudBees Flow you need to set those values in each step that depends on them.

The easiest way to set values is to create a short command file during the first step of the job and save it in the top-level directory of the job workspace. The command file should contain a sequence of commands to set all environment variables required by the job. Then, in each subsequent step, invoke that command file at the beginning of the step to set environment variables for that step.

This approach simplifies management of your environment variables: if you need to change a variable, you change only the setup step at the beginning of the job, and the value is reflected in all of the following steps automatically.

When do you use subprocedures?

In CloudBees Flow, the action for a step can invoke another procedure, passing parameters. This is a powerful tool for structuring your processes. Subprocedures tend to be used in two ways: for encapsulating reusable processes and for managing concurrency .

encapsulating reusable processes , is the preferred use. If you need to perform certain activities repeatedly in different places, you can use a subprocedure for them.
For example, when we build and test the CloudBees Flow product, we do it on multiple platforms, but the mechanism is virtually the same on all platforms. We implement the basic build and test mechanism for one platform in a subprocedure named BuildAndTest. In our main procedure for production builds, we invoke BuildAndTest multiple times, once for each of the platforms. If the mechanism changes, we only need to change it once in BuildAndTest to fix all six platforms.
managing concurrency —Suppose you have three steps A1, A2, and A3, that must run sequentially, and two other steps B1 and B2 that must also run sequentially, but the two groups can run in parallel. The way to implement this is to put A1, A2, and A3 in one subprocedure named "A", and B1 and B2 in another subprocedure named "B".

Now you can create a top-level procedure with two steps: one invoking A and the other invoking B, and mark those steps for parallel execution. As a result, A and B will run in parallel but the steps inside each subprocedure will run serially. If you placed all five steps inside the top-level procedure, there would be no way to achieve this effect.

How do you evolve procedures?

After you start using CloudBees Flow for managing your build and test processes, it will not be long before you encounter the following situation:

You have a set of CloudBees Flow procedures that work fine on all existing versions of the product, but you are about to change the product in a way that affects CloudBees Flow procedures. For example, you might be adding a new component, or perhaps you are going to change the way the product is installed for testing, or perhaps you have restructured the product to allow more steps to run concurrently. As a result, you need to change CloudBees Flow procedures for the current version. At the same time, you need older product versions to continue building as well.

In most cases, the easiest way to handle these situations is to leave the current CloudBees Flow procedures alone and keep using them for your older product version. This ensures you will not "break" older versions. Make copies of the procedures that need to change, then modify the copies and use them for your new software version.

You will have one version of procedures for each significant version of the product. For example, at CloudBees, we incorporate the version number into the procedure names: "Master-1.0" is the version of the Master procedure used with the 1.0 release, "Master-1.1" is used for 1.1, and "Master" with no version number is used for our current development.

Another approach would be to make a copy of the entire project for each release. This method may be easier if other information in the project, such as property values, needs to evolve also.

The "copying" approach gets more and more complicated as you add more and more versions of the procedure. To minimize this complexity, try to structure your procedures so they do not have to change frequently. Some techniques you can use:

Use parameters for things that do change frequently, such as the branch of software to build.
Where possible, store scripts and other information [used by the procedure] as part of the source code for the project as described above, so this information is versioned automatically, along with your source code.

Porting existing scripts

You probably already have scripts you have been using to build and test software, before switching to CloudBees Flow.

Over time, you will probably want to do quite a bit of restructuring to take full advantage of CloudBees Flow. However, you probably do not need to do major restructuring to get started with CloudBees Flow. Here are some steps you might take to "get up and running" with CloudBees Flow as quickly as possible and gradually convert to make the best use of the system.

To begin, see if you can take your existing script and run it monolithically as a single step in a procedure under CloudBees Flow. This will get you running and start providing some CloudBees Flow benefits for resource management, error analysis, and reporting.
Next, start dividing your previous script into separate steps for CloudBees Flow. Begin with the steps easiest to separate from the rest of the script, such as the commands to compile your software. One of the challenges you face is how to pass data from one step to another. In a monolithic script, you can keep the data in variables that persist throughout the job. As you divide the script into steps, you need to figure out which variables are used only in a single step and which variables pass from step to step. For variables that pass between steps, use CloudBees Flow properties to store their values. In many cases you can compute shared data values in a single step at the beginning of the job, store their values in properties on the job, then access those values read-only in later job steps.
After dividing the steps, you can do finer-grain reporting. Start using the postp postprocessor to scan your log files and generate statistics and diagnostics. Over time you will probably discover you have a few error and warning messages peculiar to your site, and are not captured by postp’s patterns. Learn how to extend postp with additional patterns to capture all the information that matters to you.
Now that reporting statistics are being recorded, you can develop your own reports to summarize those statistics. You can easily use ectool to read statistics from the properties where they are stored.
Something else you can do after steps are divided: Start tuning the performance of your procedures. For example, you can start marking steps to run in parallel, and you can choose resources on a step-by-step basis to finish your jobs faster and share resources more effectively.

CloudBees Flow project version control

We recommend exporting your CloudBees Flow project data at regular intervals and saving it in the same configuration management system you use for your source code (use the "export" ectool command to generate an XML file representing the contents of a project or procedure). This process allows you to track project changes and also allows you to "look back" at older versions if that should become desirable. Also, you should snapshot a version of CloudBees Flow project data at the time of each major software release, so you can revert to the exact CloudBees Flow configuration used to generate that release if necessary.