My controller keeps continuously restarting on a loop

2 minute read

Issue

When creating a controller, it starts and begins the provisioning. Apparently, everything is working as expected but then the controller is restarted. In the logs, we can see that the controller has been killed. We cannot see any other errors in the logs.

xxxx-xx_xx ..:02.666+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Killing]: Killing container with id docker://jenkins:Need to kill Pod
xxxx-xx_xx ..:04.544+0000 [id=766939]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [StatefulSet][teams-mycontroller][SuccessfulCreate]: create Pod teams-mycontroller-0 in StatefulSet teams-mycontroller successful
xxxx-xx_xx ..:04.614+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Scheduled]: Successfully assigned jenkins/teams-mycontroller-0 to ucp-member-580186
xxxx-xx_xx ..:05.597+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Pulled]: Container image "cloudbees/cloudbees-core-mm:2.150.2.3" already present on machine
xxxx-xx_xx ..:05.656+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Created]: Created container
xxxx-xx_xx ..:05.951+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Started]: Started container
xxxx-xx_xx ..:07.347+0000 [id=766939]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [StatefulSet][teams-mycontroller][SuccessfulDelete]: delete Pod teams-mycontroller-0 in StatefulSet teams-mycontroller successful
xxxx-xx_xx ..:09.224+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Killing]: Killing container with id docker://jenkins:Need to kill Pod
xxxx-xx_xx ..:11.147+0000 [id=766939]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [StatefulSet][teams-mycontroller][SuccessfulCreate]: create Pod teams-mycontroller-0 in StatefulSet teams-mycontroller successful
xxxx-xx_xx ..:11.215+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Scheduled]: Successfully assigned jenkins/teams-mycontroller-0 to ucp-member-618943
xxxx-xx_xx ..:12.094+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Pulled]: Container image "cloudbees/cloudbees-core-mm:2.150.2.3" already present on machine
xxxx-xx_xx ..:12.136+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Created]: Created container
xxxx-xx_xx ..:12.435+0000 [id=766942]	INFO	c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Started]: Started container

According to the data, the statefulset is deleting the pod, this usually happens when you perform a change in the configuration, which is not the case.

The only thing that we see which is not normal compared with other working controllers is that when we run the command below:

kubectl rollout history statefulset/<name_of_the_controller_statefulset>

We can see more than one controller revision for the statefulset and several controller revisions with the same number.

Resolution

This is potentially due to a Kubernetes issue: Kubernetes 61998, in order to get this fixed you would need to upgrade your Kubernetes cluster at least to version 1.12.

Workaround

A workaround is to delete all the controller revisions except for the last one. In order to do that, we can run a command like the one shown below:

  • kubectl get controllerrevision this will give us the different controller revisions existing in our namespace. We will see that the failing controller will show a large number of revisions.

  • Now, we should use kubectl delete controllerrevision <name1> <name2> ... <name_N> to delete all the controller revisions except for the last one.

Once that the operation is complete, you should see your controller starting successfully.

Tested product/plugin versions