Kubernetes on Google Kubernetes Engine

16 minute readReference

This document is designed to help you ensure that your on-premise or private-cloud Kubernetes cluster is optimally configured to run CloudBees CI securely and efficiently using Google Kubernetes Engine (GKE).

These are not requirements, and they do not replace the official Kubernetes and cloud provider documentation. They are recommendations based on experience running CloudBees CI on Kubernetes. Use them as guidelines for your deployment.

For more information on Kubernetes, refer to the official Kubernetes documentation and the official documentation for GKE.

Terms and definitions

Jenkins

Jenkins is an open-source automation server. With Jenkins, organizations can accelerate the software development process by automating it. Jenkins manages and controls software delivery processes throughout the entire lifecycle, including build, document, test, package, stage, deployment, static code analysis and much more. You can find more information about Jenkins and CloudBees contributions on the CloudBees site.

CloudBees CI

With CloudBees CI, organizations can embrace rather than replace their existing DevOps toolchains while scaling Jenkins to deliver enterprise-wide secure and compliant software.

Operations center

Operations console for Jenkins that allows you to manage multiple Jenkins controllers.

Architectural overview

This section provides a high-level architectural overview of CloudBees CI, designed to help you understand how CloudBees CI works, how it integrates with Kubernetes, its network architecture and how managed controllers and build agents are provisioned.

CloudBees CI is essentially a set of Docker containers that can be deployed to run a cluster of machines within the Kubernetes container management system. Customers are expected to provision and configure their Kubernetes system before installing CloudBees CI.

CloudBees CI includes the operations center that provisions and manages CloudBees managed controllers and team controllers. CloudBees CI also enables managed controllers and team controllers to perform dynamic provisioning of build agents via Kubernetes.

Machines and roles

CloudBees CI is designed to run in a Kubernetes cluster. For the purposes of this section, a Kubernetes cluster is a set of machines (virtual or bare-metal) that run Kubernetes. Some of these machines provide the Kubernetes control plane. They control the containers that run on the other type of machines known as Kubernetes Nodes. The CloudBees CI containers will run on the Kubernetes Nodes.

The Kubernetes control planes provide an HTTP-based API that can be used to manage the cluster, configure it, deploy containers, and so on. kubectl is a command-line client that can be used to interact with Kubernetes via this API. For more information on Kubernetes, refer to the Kubernetes documentation.

CloudBees CI Docker containers

The Docker containers in CloudBees CI are:

  • cloudbees-cloud-core-oc: operations center

  • cloudbees-core-mm: CloudBees CI managed controller

The Docker containers used as Jenkins build agents are specified on a per-Pipeline basis and are not included in CloudBees CI. For more details, refer to the example Pipeline in Agent provisioning.

The cloudbees-cloud-core-oc, cloudbees-core-mm, and build agent container images can be pulled from the public Docker Hub repository or from a private Docker Registry that you deploy and manage. If you need to use a private registry, you have to configure your Kubernetes cluster to do that.

CloudBees CI Kubernetes resources

Kubernetes terminology

The following terms are useful to understand. This is not a comprehensive list. For full details on these and other terms, refer to the Kubernetes documentation.

Pod

A set of containers that share storage volumes and a network interface.

ServiceAccount

Defines an account for accessing the Kubernetes API.

Role

Defines a set of permission rules for access to the Kubernetes APIs.

RoleBinding

Binds a ServiceAccount to a role.

ConfigMap

A directory of configuration files available on all Kubernetes nodes.

StatefulSet

Managing deployment and scaling of a set of pods.

Service

Provides access to a set of pods at one or more TCP ports.

Ingress

Uses the hostname and path of an incoming request to map the request to a specific service.

CloudBees CI Kubernetes resources

CloudBees CI defines the following Kubernetes resources:

Resource type Resource value Definition

ServiceAccount

jenkins

Account used to manage Jenkins build agents.

ServiceAccount

cjoc

Account used by operations center to manage managed controllers.

Role

master-management

Defines permissions needed by operations center to manage Jenkins controllers.

RoleBinding

cjoc

Binds the operations center ServiceAccount to the master-management Role.

RoleBinding

jenkins

Binds the jenkins ServiceAccount to the pods-all Role.

ConfigMap

cjoc-config

Defines the configuration used to start the cjoc Java process within the cjoc container.

ConfigMap

cjoc-configure-jenkins-groovy

Defines location.groovy, which is executed on startup by cjoc to define its own hostname.

ConfigMap

jenkins-agent

Defines the Bash script that starts the Jenkins agent within a build agent container.

StatefulSet

cjoc

Defines a pod for the cjoc container, allocates a persistent volume for its JENKINS_HOME directory, and ensures that one such pod is always running.

Service

cjoc

Defines a Service front-end for the cjoc pod and assigns TCP ports 80 and 50000 to JNLP.

Ingress

default

Maps requests for the CloudBees CI hostname and the path /cjoc to the cjoc pod.

Ingress

cjoc

Maps requests for the CloudBees CI hostname to the path /cjoc.

Setting pod resource limits

You can specify default limits in Kubernetes namespaces. These default limits constrain the amount of CPU or memory a given pod can use unless the pod’s configuration explicitly overrides the defaults.

For example, the following configuration limits requests running in the master-0 namespace to 256 MB of memory and total memory usage to 512 MB:

apiVersion: v1 kind: LimitRange metadata: name: mem-limit-range namespace: master-0 spec: limits: - default: memory: 512Mi defaultRequest: memory: 256Mi type: Container
Overriding default pod resource limits

To override the default configuration on a pod-by-pod basis, configure the controller that needs more resources:

  1. Sign in to the operations center.

  2. Navigate to Manage Jenkins  Kubernetes Pod Templates.

  3. Select Add a pod template.

    1. Locate the template you want to edit.

    2. If the template you want to edit does not exist, create it.

  4. On the Containers tab, select Add Containers and select container.

  5. Select Advanced, and then modify the resource constraints for the template.

Visualizing CloudBees CI architecture

The diagram below illustrates the CloudBees CI architecture on Kubernetes. The diagram shows three Kubernetes control planes, which are represented by the three dotted-line overlapping rectangles on the left. The diagram also shows two Kubernetes worker nodes, which are represented by the two dotted-line large rectangles in the center and on the right.

Here are the key for the colors used in the diagram:

  • Green: Processes that are part of Kubernetes

  • Pink: Kubernetes resources created by installing and running CloudBees CI

  • Yellow: Kubernetes resources required by CloudBees CI

Architecture diagram
Figure 1. CloudBees CI architecture

Kubernetes control plane

Running on each Kubernetes control plane, there are the Kubernetes processes that manage the cluster: the API Server, the controller manager and the scheduler. In the bottom left of the diagram are resources that are created as part of the CloudBees CI installation, but that are not really tied to any one node in the system.

Kubernetes nodes

On the Kubernetes nodes and shown in green above is the kubelet process, which is part of Kubernetes and is responsible for communicating with the Kubernetes API server and starting and stopping Kubernetes pods on the node.

On one node, you see the operations center pod, which includes a Controller Provisioning plugin that is responsible for starting new controller pods. On the other node you see a controller pod, which includes the Jenkins Kubernetes Plugin and uses that plugin to manage Jenkins build agents.

Each operations center and controller pod has a Kubernetes Persistent Volume Claim where it stores its Jenkins Home directory. Each Persistent Volume Claim is backed by a storage service, such as an EBS volume on AWS or an NFS drive in an OpenShift environment. When a controller pod is moved to a new node, its storage volume must be detached from its old node and then attached to the pod’s new node.

Pod scheduling best practice

Prevent operations center and managed controllers pods from being moved during scale down operations by adding the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

apiVersion: apps/v1 kind: StatefulSet spec: template: metadata: annotations: cluster-autoscaler.kubernetes.io/safe-to-evict: "false"`

Managed controller provisioning

One of the benefits of CloudBees CI is the easy provisioning of new Jenkins managed controllers from the operations center UI. This feature is provided by the Master Provisioning Kubernetes plugin. When you provision a new controller, you must specify the amount of memory and CPU to be allocated to the new controller, and then the plugin calls the Kubernetes API to create a controller.

The diagram below displays the result of a new controller launched via the operations center. The operations center’s Master Provisioning Kubernetes plugin calls Kubernetes to provision a new StatefulSet to run the managed controller pod.

Controller provisioning
Figure 2. Managed controller provisioning

Agent provisioning

Agents are created and destroyed in CloudBees CI by the Kubernetes Plugin for Jenkins. A Jenkins Pipeline can specify the build agent using the standard Pipeline syntax. For example, below is a CloudBees CI Pipeline that builds and tests a Java project from a GitHub repository using a Maven and Java Docker image:

Pipeline example
podTemplate(label: 'kubernetes', containers: [ containerTemplate(name: 'maven', image: 'maven:3.5.2-jdk-8-alpine', ttyEnabled: true, command: 'cat') ]) { stage('Preparation') { node("kubernetes") { container("maven") { git 'https://github.com/jglick/simple-maven-project-with-tests.git' sh "mvn -Dmaven.test.failure.ignore clean package" junit '**/target/surefire-reports/TEST-*.xml' archive 'target/*.jar' } } } }

In the above example, the build agent container image is maven:3.5.2-jdk-8-alpine. It will be pulled from the Docker Registry configured for the Kubernetes cluster.

The diagram below shows how build agent provisioning works. When the Pipeline runs, the Kubernetes Plugin for Jenkins on the managed controller calls Kubernetes to provision a new pod to run the build agent container. Then, Kubernetes launches the build agent pod to execute the Pipeline.

Agent provisioning
Figure 3. Agent provisioning

CloudBees CI required ports

CloudBees CI requires the following open ports. Refer to the Kubernetes documentation for its port requirements.

Port number Description

80

HTTP access to the web interface of operations center and managed controllers.

443

HTTPS access to the web interface of operations center and managed controllers

50000

TCP port for inbound agents access for direct connection between operations center and managed controllers, controllers, and agents.

Network encryption

Network communication between Kubernetes clients such as kubectl, Kubernetes control planes, and nodes are encrypted via TLS protocol. Kubernetes Managing TLS in a Cluster explains how certificates are obtained and managed by a cluster.

Communication between application containers running on a Kubernetes cluster, such as the operations center and managed controllers, can be encrypted as well, but this requires the deployment of a network overlay technology.

End-to-end web browser to CloudBees CI communications can be TLS encrypted by configuring the Kubernetes Ingress that provides access to CloudBees CI to be the termination point for SSL. Network overlay and SSL termination configuration is covered in a separate section.

High Availability

Kubernetes can be configured for High Availability (HA) by using at least three Kubernetes control planes on three separate machines in different availability zones.

Persistence

Operations center and managed controllers store their data in a file-system directory, known as $JENKINS_HOME. The operations center has its own $JENKINS_HOME, and each controller also has one.

CloudBees CI uses a Kubernetes feature known as Persistent Volume Claims to dynamically provision persistent storage for the operations center, each managed controller, and build agents.

Cluster sizing and scaling

This document provides general recommendations about sizing and scaling a Kubernetes cluster for CloudBees CI starting with some general notes about minimum requirements and ending with a table of more concrete sizing guidelines recommended by CloudBees.

General notes

When sizing and scaling a cluster you should consider the operational characteristics of Jenkins. The relevant ones are:

  • Jenkins controllers are memory and disk IOPS bound, with some CPU requirements as well. Low IOPS results into longer startup times and worse general performance. Low memory results into slow response time.

  • Build Agents requirements depend on the kind of tasks being executed on them.

Pods are defined by their CPU and memory requirement and they can’t be split across multiple hosts.

It is recommended to use hosts that are big enough so that they can host several pods (Rule of thumb : 3-5 pods per host) at the same time to maximize their actual use.

Example: You are running builds requiring 2 GB of memory each. You need configure pods to have 2 GB each for supporting such builds. The rule of thumb says you should have hosts with 6-10 GB of memory (3 x 2 - 5 x 2).

Depending on your cloud provider, it may be possible to enable auto-scaling in Kubernetes to match with the actual requirements and reduce the operational costs.

If you don’t have auto-scaling in your environment, we recommend you to plan extra capacity in order to sustain hardware failure.

Storage

Each managed controller is provisioned on a separate Persistent Volume (PV). It is recommended to use a storage class with the most IOPS available.

The host storage is not getting used by managed controllers but depending on the instance type you may have restrictions on the kind of block storage you can use (for example, on Azure, you need to use an instance type ending with s).

Disk space on the hosts is necessary to host docker images, containers and volumes. Build workspaces will be on host storage so there must be enough free disk space available on nodes.

CPU

CloudBees CI uses the notion of CPU defined by Kubernetes.

By default, a managed controller requires 1 CPU. Each build agent also requires CPU, so what will determine the total CPU requirement is :

  • (mostly static) The number of managed controllers multiplied by the number of CPU each of them requires.

  • (dynamic) The number of concurrent build agents used by the cluster multiplied by the CPU requirement of pod template. A minimum amount of 1 CPU is recommended for a pod template but you can use more cpus if parallel processing is required by the task.

Most build tasks are CPU-bound (compilation, test executions). So it is quite important when defining pod templates not to underestimate the number of cpus to allocate if you want good performance.

Memory

By default, a managed controller requires 3 GB of RAM.

To determine the total memory requirement, take into account:

  • (mostly static) The number of managed controllers multiplied by the amount of RAM each of them requires.

  • (dynamic) The number of concurrent build agents used by the cluster multiplied by the memory requirement of pod template

Memory also impacts performance. Not giving enough memory to a managed controller will cause additional garbage collection and reduced performance.

Controller Sizing Guidelines

Below are some more concrete sizing guidelines compiled by CloudBees Support Engineers:

Table 1. Controller sizing guidelines
Requirement Baseline Rationale

Average Weekly Users

20

Besides the team themselves, other non-team collaborators often must access the team’s Jenkins to download artifacts or otherwise collaborate with the team. This includes API clients.

Serving the Jenkins user interface impacts IO and CPU consumption and will also result in increased memory usage due to the caching of build results.

CPU Cores

4

A Jenkins of this size should have at least 4 CPU cores available.

Maximum Concurrent Builds

50

Healthy agile teams push changes multiple times per day and may have a large test suite including unit, integration and automated system tests.

We generally observe Jenkins easily handles up to 50 simultaneous builds, with some Jenkins regularly running many multiples of this number. However, poorly written or complicated pipeline code can significantly affect the performance and scalability of Jenkins since the pipeline script is compiled and executed on the controller.

To increase the scalability and throughput of your Jenkins controller, we recommend that Pipeline scripts and libraries be as short and simple as possible. This is the number one mistake teams make. If build logic can possibly be done in a Bash script, Makefile or other project artifact, Jenkins will be more scalable and reliable. Changes to such artifacts are also easier to test than changes to the Pipeline script

Maximum Number of Pipelines (Multi-branch projects)

75

Well-designed systems are often composed of many individual components. The microservices architecture accelerates this trend, as does the maintenance of legacy modules.

Each pipeline can have multiple branches, each with its own build history. If your team has a high number of pipeline jobs, you should consider splitting your Jenkins further.

Recommended Java Heap Size

4 GB

We regularly see Jenkins of this size performing well with 4 gigabytes of heap. This means setting the -Xmx4g as recommended in option B of this Knowledge Base article: Java Heap settings Best Practice.

If you observe that your Jenkins instance requires more than 8 gigabytes of heap, your Jenkins likely needs to be split further. Such high usage could be due to buggy pipelines or perhaps non-verified plugins your teams may be using.

Team Size

10

Most agile resources warn against going above 10 team members. Keeping the team size at 10 or below facilitates the sharing of knowledge about Jenkins and pipeline best practices.Three items

Auto-scaling with GKE

For information about setting up auto-scaling on GKE, refer to Enabling auto-scaling nodes on GKE.

Ingress TLS termination

Ingress TLS termination should be used to ensure that network communication to the CloudBees CI UI is encrypted from end-to-end.

To ensure that your web browser to CloudBees CI communication is encrypted end-to-end, you must change the Kubernetes Ingress used by CloudBees CI to use your TLS certificates, thereby making it the termination point for TLS.

This information provides a general overview of the changes you must make, but the definitive guide to set this up is in the Kubernetes Ingress TLS documentation.

Store your TLS certificates in a Kubernetes secret

To make your TLS certificates available to Kubernetes, use the Kubernetes kubectl command-line tool to create a Kubernetes secret. For example, if your certificates are in /etc/mycerts, issue this command to create a secret named my-certs:

kubectl create secret tls my-certs \ --cert=/etc/mycerts/domain.crt --key=/etc/mycerts/privkey.pem

For more information, refer to the definitive guide to secrets in the Kubernetes Secrets documentation.

Change the two CloudBees CI Ingresses to be the TLS termination point

Configure the CloudBees CI Helm values to use your TLS certificate via the OperationsCenter.Ingress.tls.Enable, OperationsCenter.Ingress.tls.SecretName Helm values, using my-certs as the SecretName. Refer to the following example:

OperationsCenter: Ingress: tls: # OperationsCenter.Ingress.tls.Enable -- Set this to true in order to enable TLS on the ingress record Enable: true # OperationsCenter.Ingress.tls.SecretName -- The name of the secret containing the certificate # and private key to terminate TLS for the ingress SecretName: my-certs

Turn off the NGINX proxy protocol

Before TLS can work properly for CloudBees CI on GKE, NGINX proxy protocol must be disabled. Locate the ConfigMap in your namespace named nginx-configuration, edit it, and change the use-proxy-protocol setting to false.

Domain name change

  1. Stop all managed controllers/team controllers from the operations center dashboard. This can be achieved either automatically with a cluster operation or manually using Managed controller/Team controller  Manage.

  2. Use one of the following options to modify the hostname in ingress/cjoc and cm/cjoc-configure-jenkins-groovy and add the new domain name:

    • Change the hostname values in the cloudbees-core.yml file.

    • Edit the operations center ingress resource and modify the domain name.

      $ kubectl edit ingress/cjoc

      Modify the operations center configuration map to change the operations center URL.

      $ kubectl edit cm/cjoc-configure-jenkins-groovy
  3. Delete the operations center pod and wait until it is terminated.

    $ kubectl delete pod/cjoc
  4. Verify that Operations Center  Manage Jenkins  Configure System  Jenkins Location  Jenkins URL has been properly updated. If it has not been updated, select the new domain and then select Save.

  5. Start all managed controllers/team controllers from the operations center dashboard. This can be achieved either automatically with a cluster operation or manually using Managed controller/Team controller  Manage.

    The new domain name must appear in all of those resources:

    $ kubectl get statefulset/<master> -o=jsonpath='{.spec.template.spec.containers[?(@.name=="jenkins")].env}' $ kubectl get cm/cjoc-configure-jenkins-groovy -o json $ kubectl get ingress -o wide
The domain name must be identical to what is used in the browser; otherwise, a default backend - 404 error is returned.

Configuring persistent storage

For persistence of operations center and managed controller data, CloudBees CI must be able to dynamically provision persistent storage. When deployed, the system provisions storage for the operations center’s $JENKINS_HOME directory and whenever a new managed controller is provisioned, the operations center provisions storage for that controller’s $JENKINS_HOME.

On Kubernetes, dynamic provisioning of storage is accomplished by creating a Persistent Volume Claim (PVC). The PVC uses a storage class to coordinate with a storage provisioner to provision that storage and make it available to CloudBees CI.

Refer to the next section to set up a storage class for your environment, if applicable.

A detailed explanation of Kubernetes storage concepts is beyond the scope of this document. For additional background information, refer to:

Storage requirements

Since pipelines typically read and write many files during execution, CloudBees CI requires high-speed storage. On GKE, CloudBees CI requires that you use solid-state disk (SSD) storage.

Storage class considerations for multi-zone

For multi-zone environments, the volumeBindingMode attribute (supported since Kubernetes version 1.12) must be set to WaitForFirstConsumer; otherwise, volumes may be provisioned in a zone where the pod that requests it cannot be deployed. This field is immutable. Therefore, if it is not already set, a new storage class must be created.

Set up an SSD-based persistent volume storage class

Kubernetes defines a default storage class for standard persistent disks:

$ kubectl get sc NAME PROVISIONER AGE standard (default) kubernetes.io/gce-pd 1d

To use SSD persistent disks, you must create a new storage class of type pd-ssd. Create an ssd-storage.yaml file with the following content:

echo "apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ssd provisioner: kubernetes.io/gce-pd allowVolumeExpansion: true # Uncomment the following for multi zone clusters # volumeBindingMode: WaitForFirstConsumer parameters: type: pd-ssd" > ssd-storage.yaml

Create the new storage class:

kubectl create -f ssd-storage.yaml

Persistent disks restrictions

The following restrictions apply to standard (pd-standard) and SSD (pd-ssd) persistent disks and not to regional persistent disks.

Kubernetes persistent volume life cycle

Kubernetes does not support failover to a different zone for persistent disk. This is partly due to the Kubernetes scheduling behavior where the persistent volume (PV) is created first and that dictates which zone the pod will be deployed into, effectively tying the pod to the PV zone. As of Kubernetes 1.10, PV allocation does not take the pod constraints into consideration. This is also due to volumes being zone-specific and not having a dynamic provisioner that supports a snapshot mechanism that would allow recreating a volume from a snapshot in a different zone.

Managed controller failover considerations

  • If the cluster has nodes in multiple zones, the managed controllers are spread across all zones that have healthy nodes. This also mean that all zones defined for nodes should have healthy nodes to provision managed controllers; otherwise, a managed controller provisioning action might not succeed if the volume is assigned to a zone with no healthy nodes.

  • If a node that runs a managed controller fails, the managed controller is restarted on a node in the same zone. This is because persistent disk volumes are tied to a specific zone.

  • If there are no healthy nodes in the zone, the managed controller cannot restart in a different zone. The managed controller only restarts once a healthy node is available in the original zone of the volume that it was created in.

Enable Storage Encryption

Storage encryption should be used to ensure that all CloudBees CI data is encrypted at rest and on GKE; this is the default. All data is encrypted at rest and there is no setup needed to enable encryption.

If you prefer to provide your own encryption keys, you should do this before installing CloudBees CI. For more information, refer to Encrypt disks with customer-supplied encryption keys.

Integrate single sign-on

Once your CloudBees CI cluster is up and running, you can integrate it with a SAML-based single sign-on (SSO) system and configure Role Based Authentication Controls (RBAC). This is done by installing the Jenkins SAML plugin, configuring it to communicate with your IDP, and then configuring your IDP to communicate with CloudBees CI.

Prerequisites for this task

Before you set up SAML-based SSO and RBAC, you must:

When you make changes to the security configuration, you may lock yourself out of the system. If this happens, you can recover by following the instructions in the How do I log in into Jenkins after I’ve logged myself out CloudBees Knowledge Base article.

Install the SAML plugin

To install the SAML plugin on the operations center:

  1. Sign in to the operations center and select Manage Jenkins  Manage Plugins  Available

  2. Enter SAML in the search box.

  3. Select the SAML plugin.

  4. Select Download now and install after restart

  5. Select Restart Jenkins when installation is complete and no jobs are running.

    You do not need to install the plugin on managed controllers; you only need to install the plugin to the operations center.

Enable and configure SAML authentication

  1. Sign in to the operations center and select Manage Jenkins  Configure Global Security.

  2. Select Enable security and confirm there is a SAML 2.0 option in the Security Realm setting.

    If the SAML 2.0 option is not present, then the Jenkins SAML plugin is not installed, and you need to install the SAML plugin.
  3. Read and carefully follow the Jenkins SAML plugin instructions.

  4. Enter the IDP Metadata (XML data) and specify the attribute names that your IDP uses for username, email, and group membership.

  5. When you are ready, select Save to store the new security settings.

Export service provider metadata to your IDP

After you save your security settings, the operations center reports your service provider metadata (XML data). You must copy this data and give it to your IDP administrator, who will add it to the IDP configuration.

You can find the service provider metadata by following the link on the Configure Global Security page at the end of the SAML section. The link looks similar to the following:

Service Provider Metadata which may be required to configure your Identity Provider (based on last saved settings).

Sign in to the operations center and set up RBAC

  1. Once your IDP administrator confirms that your IDP metadata has been added to the IDP, sign in to the operations center.

  2. Enable and and configure RBAC. For more information, refer to Restricting access and delegating administration with Role-Based Access Control.

In August 2020, the Jenkins project voted to replace the term master with controller. We have taken a pragmatic approach to cleaning these up, ensuring the least amount of downstream impact as possible. CloudBees is committed to ensuring a culture and environment of inclusiveness and acceptance - this includes ensuring the changes are not just cosmetic ones, but pervasive. As this change happens, please note that the term master has been replaced through the latest versions of the CloudBees documentation with controller (as in managed controller, client controller, team controller) except when still used in the UI or in code.