Issue
When creating a controller, it starts and begins the provisioning. Apparently, everything is working as expected but then the controller is restarted. In the logs, we can see that the controller has been killed. We cannot see any other errors in the logs.
xxxx-xx_xx ..:02.666+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Killing]: Killing container with id docker://jenkins:Need to kill Pod xxxx-xx_xx ..:04.544+0000 [id=766939] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [StatefulSet][teams-mycontroller][SuccessfulCreate]: create Pod teams-mycontroller-0 in StatefulSet teams-mycontroller successful xxxx-xx_xx ..:04.614+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Scheduled]: Successfully assigned jenkins/teams-mycontroller-0 to ucp-member-580186 xxxx-xx_xx ..:05.597+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Pulled]: Container image "cloudbees/cloudbees-core-mm:2.150.2.3" already present on machine xxxx-xx_xx ..:05.656+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Created]: Created container xxxx-xx_xx ..:05.951+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Started]: Started container xxxx-xx_xx ..:07.347+0000 [id=766939] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [StatefulSet][teams-mycontroller][SuccessfulDelete]: delete Pod teams-mycontroller-0 in StatefulSet teams-mycontroller successful xxxx-xx_xx ..:09.224+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Killing]: Killing container with id docker://jenkins:Need to kill Pod xxxx-xx_xx ..:11.147+0000 [id=766939] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [StatefulSet][teams-mycontroller][SuccessfulCreate]: create Pod teams-mycontroller-0 in StatefulSet teams-mycontroller successful xxxx-xx_xx ..:11.215+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Scheduled]: Successfully assigned jenkins/teams-mycontroller-0 to ucp-member-618943 xxxx-xx_xx ..:12.094+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Pulled]: Container image "cloudbees/cloudbees-core-mm:2.150.2.3" already present on machine xxxx-xx_xx ..:12.136+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Created]: Created container xxxx-xx_xx ..:12.435+0000 [id=766942] INFO c.c.m.k.TaskListenerEventWatcher#log: Event [Pod][teams-mycontroller-0][Started]: Started container
According to the data, the statefulset is deleting the pod, this usually happens when you perform a change in the configuration, which is not the case.
The only thing that we see which is not normal compared with other working controllers is that when we run the command below:
kubectl rollout history statefulset/<name_of_the_controller_statefulset>
We can see more than one controller revision for the statefulset and several controller revisions with the same number.
Resolution
This is potentially due to a Kubernetes issue: Kubernetes 61998, in order to get this fixed you would need to upgrade your Kubernetes cluster at least to version 1.12.
Workaround
A workaround is to delete all the controller revisions except for the last one. In order to do that, we can run a command like the one shown below:
-
kubectl get controllerrevision
this will give us the different controller revisions existing in our namespace. We will see that the failing controller will show a large number of revisions. -
Now, we should use
kubectl delete controllerrevision <name1> <name2> ... <name_N>
to delete all the controller revisions except for the last one.
Once that the operation is complete, you should see your controller starting successfully.
Tested product/plugin versions
-
Docker EE v1.11.7-docker-1