The software development process for your products most likely includes various stages, such as development, QA, performance testing, user acceptance testing (UAT), pre-production, and production. Your software progresses through these stages and various forms of testing and acceptance to ensure the quality and completeness of your code. When you create automations in CloudBees Flow, you are developing software to monitor and control your release processes; that software should be managed in the same way as your product software. Your software should be developed in a CloudBees Flow development server, then it should go through testing in a CloudBees Flow test environment before being deployed into your CloudBees Flow production environment—thus following the typical development process.
A recommended best practice for CloudBees Flow automation development is to separate your CloudBees Flow production server from the servers used for other activities, such as software development, QA, UAT, and pre-production. Although you could use the same server (that is, the same CloudBees Flow installation) for all of these environments, this presents a higher risk of serious problems and business disruptions.
Risks of Not Using a Separate CloudBees Flow Production Server
Production systems must run nonstop and must have a high up-time such as “five 9s” (up 99.999% of the time). A development machine, depending on what is being developed, is more unstable. For example, it might require reboots because of the nature of the product under development.
If developers have root or administrator access and thus can modify the system configuration, then your production server is never truly secure. For example:
-
A shared server for development, testing, and production means shared resources: a shared database, disk space, disk I/O, CPUs, network bandwidth, and the resultant unwanted stress on the server.
-
A single incorrect program can spoil the server’s memory, CPU cores, disk I/O, and could cause it to have performance issues.
-
The server up-time SLA percentage could be impacted when the system is overburdened because of testing, because a developer created an infinite loop, and so on.
-
Troubleshooting can be more difficult when user errors cause system-wide issues.
-
ACL administration to protect production activities can be more complicated on an “open system.”
-
Hotfixes and patches to CloudBees Flow software releases cannot be verified before they are applied on the production system.
Even using separate virtual machines (VMs) on the same physical hardware is not recommended. While it helps to keep software differences separate, if there is an actual hardware failure, then both the development and production systems are impacted.
Benefits of a Separate CloudBees Flow Production Server
Because your CloudBees Flow production environment controls the deployments into your production systems, this is the environment that all of your users will be using. A separate development CloudBees Flow server allows development, prototyping, and initial testing of your new automations without jeopardizing your production environment.
With a development CloudBees Flow server, you can develop and test your automations on a smaller set of environments that mimic your target deployment environments. Within your test environment, you do all of your QA testing of your CloudBees Flow process automation and involve other groups as needed (such as UAT); here, you can also do any needed performance testing. By testing your new implementations before pushing them into your production environment, you reduce any risk of impacting your live CloudBees Flow production environment.
In addition, you can scale your deployment of CloudBees Flow to incorporate any number of development stages depending on your developer and business needs. You can assign any or all of your non-production stages to separate CloudBees Flow servers to split out development, QA, UAT, and so on to isolate them from each other on separate hardware (in which case, the development server is just one of several non-production servers).
Using separate systems for production and non-production usage lets you:
-
Develop new pipelines, new releases, and implement new CloudBees Flow features without impacting your production environment (which controls the deployments into your production systems).
-
Test new implementations and CloudBees Flow features before pushing them to your production environment.
-
Test hotfixes and patches for CloudBees Flow releases on a development server before they are installed on the production server.
-
Develop and test your automations to a smaller set of environments that mimic your target deployment environments.
These benefits help to:
-
Reduce the risk of unwanted downtime that can impact live users and harm your business because of developers’ mistakes.
-
Improve the SLA of applications and provides a better experience to users.
-
Prevent mission-critical and other production data from being mixed with test data.
-
Reduce the risks of production data getting into the wrong hands.
This is very important when organizations deal with very sensitive and private data such as client information, financial transactions, and health data.
Assigning Stages to Your CloudBees Flow Servers
This section describes two scenarios at opposite ends of the spectrum to illustrate how you can scale your deployment of CloudBees Flow to incorporate any number of non-production stages. The first scenario uses the recommended minimum of two CloudBees Flow servers and is a small subset of the set of stages in a typical organization. The second scenario uses a complete set of CloudBees Flow servers to capture the entire typical set of stages.
At a minimum, you should have the production stage on one server and all other stages on another server to protect your day-to-day business operations from downtime, poor performance, and other hazards as described above. But CloudBees further recommends that you use at least a third server for your testing stages: QA, UAT, performance testing, pre-production, and perhaps any other stages between development and production. A third server protects those activities from the same development-related risks that protect production activities. Three (or more) servers would let you follow a similar process for development in CloudBees Flow that you use for your “regular” development process.
Assigning Stages in a Simple CloudBees Flow Application Development Process
In a simple scenario, software development in CloudBees Flow is a small microcosm of your regular application development process. It provides the bare minimum of protection by protecting just your production server:
This scenario uses two servers:
-
DEV/QA server—where your developers commit code, run experiments, and fix bugs and also where QA runs manual or automated tests (because of their complexity, these tests can consume sizable server resources).
-
Production server—where you create value for your customers or your business through executing daily business processes.
This is a highly sensitive environment and deeply affects your reputation and brand name.
Assigning Stages in a Complex CloudBees Flow Application Development Process
In a complex scenario, development in CloudBees Flow follows your regular development process. Your development process probably includes multiple phases of development and testing, with your applications progressing through various levels of environments. It protects your production server as well as servers for stages other than the development stage:
This scenario uses six servers:
-
DEV server—where your developers create the automations to define your product deployments and release processes, run experiments, and fix bugs in the automations.
-
QA server—where QA runs manual or automated tests on the automations.
-
UAT server—where actual users test the automations to make sure they can correctly handle the process requirements in real-world scenarios.
-
Performance-testing server—where you test whether your CloudBees Flow configuration has the system resources (such as RAM, CPU, and disk space) needed to provide the capacity to be responsive under concurrent usage at scale.
-
Pre-production server—where the final validation of upgrades, fixes, and other changes is completed before the changes are deployed to the production environment.
-
Production server—where you create value for your customers or your business through executing daily business processes.
Best Practices for Using a Separate Production Server
Protecting the Production Server
-
Use a separate physical environment for each phase in your development life cycle.
The development, QA, and production systems should have separate physical environments. At a minimum, you could implement a mixed system where the development and QA systems share a single physical environment, but the production system has its own physical environment.
-
When administering QA, unit tests, and stress tests, ensure that they run in a totally segregated physical environment.
-
Limit “write” access to a production server only to specific system engineers.
-
A production server must host only live applications and finalized content.
-
Do not place the unfinished or preliminary versions of applications and data on a production server except under highly-controlled test conditions.
Managing Multiple Servers
-
Use DSL code for all CloudBees Flow development so that no manual actions (such as setting up ACLs) are required on any server.
-
Manage your code as an artifact so that it can be versioned and moved between servers without changing.
-
Use properties to reflect the differences (such as email distribution lists) between servers.
-
Use plugin configurations to reflect differences in credentials and URLs between environments (such as for your ticketing system).
Real-World Examples of the Risks of Development on a Production Server
Example: Users with open permission to work on the production system
A large company gave users open permission to work on the production system.
-
This allowed a user to create a procedure that launched procedures repeatedly—which ultimately clogged the system because of a large backlog of jobs being launched.
-
The server stayed up but performed very slowly, and it took a few hours to remove the unwanted jobs and return the server to normal performance.
-
This meant that a single user in one group affected all other groups in the company.
Example: User who created a process in their production environment that generated new schedules repeatedly
An organization let one of its users create a process in their CloudBees Flow production environment that generated a new schedule every 10 minutes.
-
These schedules were turned off but were not cleaned up explicitly.
-
Several years later, a user deleted the project and discovered over 100,000 schedules that also required deletion, which ultimately led to decreased performance of the system.
-
This meant that one administration cleanup effort blocked the use of the system for hours until the root cause was identified.
Example: User who repeatedly added “global properties” in their production environment by storing values under the administration area
An organization allowed a user to repeatedly add “global properties” in their CloudBees Flow production environment by storing values under the administration area rather than under their own project.
-
The system became impacted when the organization tried to use the change-tracking feature, and every time these properties changed, it caused the system to create a copy of all the global properties in the administration area.
-
This took too long to work, so the feature had to be turned off completely until the company could relocate those properties.
Lessons Learned
In these examples, a formalized code-review process and testing in an environment before promoting the code to production system could have saved thousands of dollars in lost productivity. By not adversely affecting the broader user base through system-wide issues such as those described above, a separate production server pays for itself by reducing the number of these issues in production.