Auto-scaling nodes on EKS

Auto-scaling of nodes can be achieved by installing the kubernetes cluster autoscaler

Auto-scaling considerations for EKS

While scaling up functionality is straightforward, scaling down is potentially more problematic. Scaling down involve moving workload to different nodes if the node to reclaim has still some utilization but is below the reclamation threshold. Moving agent workload would potentially mean build interruption (failed build) and moving Operations Center/Managed Master workload would mean downtime.

Distinct node pools

One way to deal with scaling down is to treat each workload differently by using separate node pools and thus apply different logic to control the scaling down.

Managed Master and Operations Center workload

By assigning Managed Master and Operations Center workload to a dedicated pool, the scaling down of nodes can be prevented by restricting eviction of Managed Master or Operations Center deployments. Scale up will happen normally when resources need to be increased in order to deploy additional Managed Masters, but scale down will only happen when the nodes are free of Operations Center or Managed Master workload. This might be acceptable since masters are meant to be stable and permanent, meaning that they are not ephemeral but long lived.

This is achieved by adding the following annotation to Operations Center and Managed Masters: "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"

For Operations Center, the annotation is added to the cloudbees-core.yml in the CJOC "StatefulSet" definition under "spec - template - metadata - annotations"

apiVersion: "apps/v1beta1"
kind: "StatefulSet"
metadata:
  name: cjoc
  labels:
    com.cloudbees.cje.type: cjoc
    com.cloudbees.cje.tenant: cjoc
spec:
  serviceName: cjoc
  replicas: 1
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      annotations:
          cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

For Managed Master, the annotation is added in the configuration page under the 'Advanced Configuration - YAML' parameter. The YAML snippet to add would look like:

kind: StatefulSet
spec:
  template:
    metadata:
      annotations:
          cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

Agent workload

By assigning Jenkins agent workload to a dedicated pool, the scaling could be handled by the default logic. Since agents are Pods that are not backed by a Kubernetes controller, they prevent scale down of nodes until no pods are running on a particular node. This prevents nodes to be reclaimed while agents are running and agent to be interrupted even though the autoscaler is below its reclamation threshold.

To create a dedicated pool for agent workload, we need to prevent other types of workload to be deployed on the dedicated pool nodes. This is accomplished by tainting the dedicated pool nodes. Then to allow scheduling of agent workload on the dedicated pool nodes, the agent pod will use a corresponding taint tolerations and a node selector.

When nodes are created dynamically by the Kubernetes autoscaler, they need to be created with the proper taint and label.

With EKS, the taint and label can be specified in the Kubernetes kubelet service defined in the UserData section of the AWS autoscaling group LaunchConfiguration.

Following the AWS EKS documentation, the nodes are created by a CloudFormation template. Download the worker node template (see EKS documentation 'launch your worker nodes' ) and add in the UserData section the node-labels and register-with-taints to the kubelet service:

      "sed -i '/bin\\/kubelet/a --node-labels=workload=build \\\\'  /etc/systemd/system/kubelet.service" , "\n",
      "sed -i '/bin\\/kubelet/a --register-with-taints=nodeType=build:NoSchedule \\\\'  /etc/systemd/system/kubelet.service" , "\n",

The autoscaling group LaunchConfiguration will look something like:

  NodeLaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      AssociatePublicIpAddress: 'true'
      IamInstanceProfile: !Ref NodeInstanceProfile
      ImageId: !Ref NodeImageId
      InstanceType: !Ref NodeInstanceType
      KeyName: !Ref KeyName
      SecurityGroups:
      - !Ref NodeSecurityGroup
      UserData:
        Fn::Base64:
          Fn::Join: [
            "",
            [
              "#!/bin/bash -xe\n",
              "CA_CERTIFICATE_DIRECTORY=/etc/kubernetes/pki", "\n",
              "CA_CERTIFICATE_FILE_PATH=$CA_CERTIFICATE_DIRECTORY/ca.crt", "\n",
              "MODEL_DIRECTORY_PATH=~/.aws/eks", "\n",
              "MODEL_FILE_PATH=$MODEL_DIRECTORY_PATH/eks-2017-11-01.normal.json", "\n",
              "mkdir -p $CA_CERTIFICATE_DIRECTORY", "\n",
              "mkdir -p $MODEL_DIRECTORY_PATH", "\n",
              "curl -o $MODEL_FILE_PATH https://s3-us-west-2.amazonaws.com/amazon-eks/1.10.3/2018-06-05/eks-2017-11-01.normal.json", "\n",
              "aws configure add-model --service-model file://$MODEL_FILE_PATH --service-name eks", "\n",
              "aws eks describe-cluster --region=", { Ref: "AWS::Region" }," --name=", { Ref: ClusterName }," --query 'cluster.{certificateAuthorityData: certificateAuthority.data, endpoint: endpoint}' > /tmp/describe_cluster_result.json", "\n",
              "cat /tmp/describe_cluster_result.json | grep certificateAuthorityData | awk '{print $2}' | sed 's/[,\"]//g' | base64 -d >  $CA_CERTIFICATE_FILE_PATH", "\n",
              "MASTER_ENDPOINT=$(cat /tmp/describe_cluster_result.json | grep endpoint | awk '{print $2}' | sed 's/[,\"]//g')", "\n",
              "INTERNAL_IP=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)", "\n",
              "sed -i s,MASTER_ENDPOINT,$MASTER_ENDPOINT,g /var/lib/kubelet/kubeconfig", "\n",
              "sed -i s,CLUSTER_NAME,", { Ref: ClusterName }, ",g /var/lib/kubelet/kubeconfig", "\n",
              "sed -i s,REGION,", { Ref: "AWS::Region" }, ",g /etc/systemd/system/kubelet.service", "\n",
              "sed -i s,MAX_PODS,", { "Fn::FindInMap": [ MaxPodsPerNode, { Ref: NodeInstanceType }, MaxPods ] }, ",g /etc/systemd/system/kubelet.service", "\n",
              "sed -i s,MASTER_ENDPOINT,$MASTER_ENDPOINT,g /etc/systemd/system/kubelet.service", "\n",
              "sed -i s,INTERNAL_IP,$INTERNAL_IP,g /etc/systemd/system/kubelet.service", "\n",
              "DNS_CLUSTER_IP=10.100.0.10", "\n",
              "if [[ $INTERNAL_IP == 10.* ]] ; then DNS_CLUSTER_IP=172.20.0.10; fi", "\n",
              "sed -i s,DNS_CLUSTER_IP,$DNS_CLUSTER_IP,g  /etc/systemd/system/kubelet.service", "\n",
              "sed -i s,CERTIFICATE_AUTHORITY_FILE,$CA_CERTIFICATE_FILE_PATH,g /var/lib/kubelet/kubeconfig" , "\n",
              "sed -i s,CLIENT_CA_FILE,$CA_CERTIFICATE_FILE_PATH,g  /etc/systemd/system/kubelet.service" , "\n",
              "sed -i '/bin\\/kubelet/a --node-labels=workload=build \\\\'  /etc/systemd/system/kubelet.service" , "\n",
              "sed -i '/bin\\/kubelet/a --register-with-taints=nodeType=build:NoSchedule \\\\'  /etc/systemd/system/kubelet.service" , "\n",
              "systemctl daemon-reload", "\n",
              "systemctl restart kubelet", "\n",
              "/opt/aws/bin/cfn-signal -e $? ",
              "         --stack ", { Ref: "AWS::StackName" },
              "         --resource NodeGroup ",
              "         --region ", { Ref: "AWS::Region" }, "\n"
            ]
          ]

The first parameter node-labels will automatically add the label workload=build to the newly created nodes. This label will then be used as the NodeSelector for the agent. The second parameter register-with-taints will automatically add the nodeType=build:NoSchedule taint to the node.

Follow the 'launch your worker nodes' EKS documentation but use the modified template to create the agent pool.

Security group Ingress settings

The security group of the default worker node pool will need to be modified to allow ingress traffic from the newly created pool security group in order to allow agents to communicate with Managed Masters running in the default pool.

The agent template will then need to add the corresponding 'toleration' to allow the scheduling of agent workload on those nodes.

agent toleration selector

For Pipelines 'toleration' can be added to podTemplate using the yaml parameter as follows:

    def label = "mypodtemplate-${UUID.randomUUID().toString()}"
    def nodeSelector = "workload=build"
    podTemplate(label: label, yaml: """
    apiVersion: v1
    kind: Pod
    spec:
      tolerations:
      - key: nodeType
        operator: Equal
        value: build
        effect: NoSchedule
    """, nodeSelector: nodeSelector, containers: [
      containerTemplate(name: 'maven', image: 'maven:3.3.9-jdk-8-alpine', ttyEnabled: true, command: 'cat')
    ]) {
      node(label) {
        stage('Run maven') {
          container('maven') {
            sh 'mvn --version'
          }
        }
      }
    }

IAM policy

The worker running the cluster autoscaler needs access to certain resources and actions.

A minimum IAM policy would look like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:TerminateInstanceInAutoScalingGroup"
            ],
            "Resource": "*"
        }
    ]
}

If the current NodeInstanceRole defined for the EKS cluster nodes does not have the policy actions required for the autoscaler, create a new 'eks-auto-scaling' policy as outlined above and then attach this policy to the NodeInstanceRole.

Install cluster autoscaler

Examples for deployment of the cluster autoscaler in AWS can be found here: AWS cluster autoscaler

As an example let’s use the single auto-scaling group example.

A few things need to be modified to match your EKS cluster setup. Here is a sample extract of the autoscaler deployment section:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
        - image: k8s.gcr.io/cluster-autoscaler:v1.1.0
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --nodes=1:10:acme-eks-worker-nodes-NodeGroup-FD1OD4CZ0J77
          env:
            - name: AWS_REGION
              value: us-west-2
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-bundle.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-bundle.crt"
  1. If the EKS is using Kubernetes v 1.9.2 or above use version 1.1.0 for the autoscaler

  2. Update the '--nodes=' command parameter. The syntax is 'ASG_MIN_SIZE:ASG_MAX_SIZE:ASG_NAME'. Multiple '--nodes' parameter can be defined to have the autoscaler autoscale multiple AWS auto-scaling groups.

  3. Update the env AWS_REGION to match the EKS cluster region

  4. If using AWS Linux 2 AMIs for the nodes, set the ssl cert paths to '/etc/ssl/certs/ca-bundle.crt'

To install the autoscaler:

$ kubectl create -f cluster-autoscaler-one-asg.yaml