Provisioning of Pods taking an abnormal amount of time

Article ID:360029244672
1 minute readKnowledge base

Issue

  • A large volume of pods are waiting to be scheduled and waiting in the queue, pods going into failed and pending status :

Pod randomPodName marked as unschedulable can be scheduled on ip-XXX-XX-XX-XX.ec2.internal. Ignoring in scale up."

Explanation

This is caused by an issue in Kubernetes and the cluster autoscaler prior to version 1.11.7

Resolution

Kubernetes versions affected

  • For a temporary workaround the node reported can be tainted so kubernetes no longer schedules jobs on this node Setting a taint to the node will prevent new pods from being scheduled there and after few seconds, the autoscaler should start scaling things properly.

  • Long term resolution it is recommended to upgrade to Kubernetes 1.11.7 and Cluster Autoscaler supported version following the guidelines listed in the related issue. Autoscaler fails to scale up nodes with pending pods