High Availability on Amazon EKS Performance Report

The following is a summary of the CloudBees CI storage performance test done to configure High Availability (HA) in EKS.

Test objective

This test was done to measure the storage performance of CloudBees CI High Availability (HA) application on Amazon’s Elastic File System (EFS). The following outlines the parameters used to set up the test and the performance results from the various testing scenarios used to provide a bigger picture of the performance scale for HA.

Test configuration

The following shows the Test type and Scalability test arrangements used as part of the testing configuration.

Performance testing was completed on CloudBees CI version 2.462.3.3 using the configurations outlined below. Variations may occur when using a different version or configurations.

Table 1. Test configuration
Test type	Scalability test
Environment	CloudBees CI HA application deployed in an EKS cluster along with EFS storage. EFS volume in General Purpose Performance mode, Elastic throughput mode with 200GB size.
Controller initial resource allocation	4CPU, 16GB RAM
Controller VM node resource available	16 CPU, 32GB RAM (c6i.4xlarge)
Workload generated using a multibranch pipeline	The workload consists of two stages: build and publish. Webhook trigger based. Twenty parallel stages with an average duration of two minutes.
Commit intervals, used to increase or decrease the workload.

Test metrics

The following metrics and storage results are shown using different configurations of High Availability/High Scalability (HA/HS) replicas as well as with an Elastic File System (EFS) controller.

Workload

These results show the metrics for the HA controller and with a varying number of replicas to determine performance limits.

Table 2. Workload metrics
Statistics	HA controller with (NFS)	HA/HS 2 replicas	HA/HS 3 replicas	HA/HS 4 replicas	HA/HA 5 replicas
Max commit frequency (commits/minute)	4	7.5	15	20	30
Job pool (node)	10	15	15	35	50
Percent IO limit	2.3	3.4	9	22	-
EFS total IO write MiB/s	21.5	33	80	99	-

Test Observations

The following observations are based on the overall testing performance using 5-replicas.

The test was successfully completed and achieved a maximum workload of 1,700 jobs in over 2 hours (that is 20 commits per minute with 5 replicas of MCs).
- Minimal job queuing was observed, and was cleared periodically.
  
  Figure 1. Test completion details
The CPU usage averaged two cores, peaking at 6.1 cores. Memory usage was around 8.7 GB on average, with a maximum of 14.9 GB.
- The file system usage was approximately 15 percent of the 5-replica workload.
  
  Figure 2. CPU usage
  
  Figure 3. Memory usage
The job pool node requirement that supports the workload was around 35.
- Having fewer nodes would lead to resource exhaustion and cause job queuing.
The DataWrite throughput achieved was around 99 MiB/s, with the Percent IO limit at 15 percent.
- Jobs were completed on time, with the build stage taking approximately 130 seconds.
- This was consistent with the 2-minute average duration allocated for this stage.
The relationship between the increase in workload and the replica count is not linear.
- When the number of replicas increases in a high-availability (HA) setup, the IOPS scalability is observable.
  
  Figure 4. Replica count vs. IOPS achieved

Conclusion

CloudBees recommends the following as a guideline for system limits.

Use five replicas to achieve write speeds of approximately 99 MiB/s:
- This corresponds to approximately 20 commits per minute in terms of workload.
- Beyond this and the application approaches its limit, with a few errors (state file handle error and so on).
The recommended size of the pod is:
- Four to six CPUs (depending on the complexity of the pipelines) and
- 16 GB of memory for managed controllers.

For more information on CloudBees CI High Availability and whether it is the right choice for your organization, refer to CloudBees High Availability.