Issue
If you stop a currently running controller (team or managed controller), and modify the Namespace field to some invalid value and hit Save, then Acknowledge error, you will then see that the controller fails to start (which is expected since the namespace does not currently exist), and the startup logs will be similar to:
[Tue May 25 18:56:48 UTC 2021] Stopping controller: cloudbees-ci/test
[Tue May 25 18:56:48 UTC 2021] Deleting service cloudbees-ci/test
[Tue May 25 18:56:48 UTC 2021] Deleting ingress cloudbees-ci/test
[Tue May 25 18:56:48 UTC 2021] Deleting stateful set cloudbees-ci/test
[Tue May 25 18:56:48 UTC 2021][Normal][Ingress][test][DELETE] Ingress cloudbees-ci/test
ERROR: Could not request to expand disk
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/api/v1/namespaces/a/persistentvolumeclaims/jenkins-home-test-0. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims "jenkins-home-test-0" is forbidden: User "system:serviceaccount:cloudbees-ci:cjoc" cannot get resource "persistentvolumeclaims" in API group "" in the namespace "a".
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
The problem is that you will notice that you are now unable to modify the Namespace value to change it back to a valid value.
Resolution
This is a bug that is planned to be fixed in an upcoming product release, under:
BEE-5019 Disable modification of namespace field when a volume exists
Workaround
To recover from this issue, you can:
-
Backup the controller data from the Kubernetes PV Using a rescue-pod
-
Backup the settings for the controller (the startup arguments, which docker image, disk space, cpu allocation) from
/var/jenkins_home/jobs/controller-name/config.xmlfrom the operations center filesystem -
If it’s a Teams controller, backup the
/var/jenkins_home/jobs/Teams/jobs/team-name/teamSecurity.xmlfrom the operations center filesystem -
If it’s a Managed controller, backup the
/var/jenkins_home/jobs/controller-name/nectar-rbac.xmlfrom the operations center filesystem -
Ensure that the
reclaim policyof the Persistent Volume for your controller is set toRetain, and notDelete. To check this, runkubectl get pvand look under theRECLAIM POLICYcolumn for thejenkins-home-$CONTROLLER_NAME-0claim. If theRECLAIM POLICYisDelete, change it toReclaimby following Changing the reclaim policy of a PersistentVolume -
Delete the controller
-
Create a new controller with the same settings as before (with the correct namespace field)
-
Restore the data in the Kubernetes PV. This step can be skipped if you successfully set the
reclaim policytoRetainin the previous step, all the data will still be in the PV, and since you chose the same name for the controller, the same PV will be used. If it’s missing, restore it by Using a rescue-pod. -
If it’s a Teams controller, restore the
/var/jenkins_home/jobs/Teams/jobs/team-name/teamSecurity.xmlfrom the operations center filesystem -
If it’s a Managed controller, restore the
/var/jenkins_home/jobs/controller-name/nectar-rbac.xmlfrom the operations center filesystem
Tested product/plugin versions
Workaround tested with CloudBees CI on Modern cloud platforms 2.277.4.3