Issue
If you stop a currently running controller (team or managed controller), and modify the Namespace
field to some invalid value and hit Save
, then Acknowledge error
, you will then see that the controller fails to start (which is expected since the namespace does not currently exist), and the startup logs will be similar to:
[Tue May 25 18:56:48 UTC 2021] Stopping controller: cloudbees-ci/test [Tue May 25 18:56:48 UTC 2021] Deleting service cloudbees-ci/test [Tue May 25 18:56:48 UTC 2021] Deleting ingress cloudbees-ci/test [Tue May 25 18:56:48 UTC 2021] Deleting stateful set cloudbees-ci/test [Tue May 25 18:56:48 UTC 2021][Normal][Ingress][test][DELETE] Ingress cloudbees-ci/test ERROR: Could not request to expand disk io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/api/v1/namespaces/a/persistentvolumeclaims/jenkins-home-test-0. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims "jenkins-home-test-0" is forbidden: User "system:serviceaccount:cloudbees-ci:cjoc" cannot get resource "persistentvolumeclaims" in API group "" in the namespace "a". at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:570)
The problem is that you will notice that you are now unable to modify the Namespace value to change it back to a valid value.
Resolution
This is a bug that is planned to be fixed in an upcoming product release, under:
BEE-5019 Disable modification of namespace field when a volume exists
Workaround
To recover from this issue, you can:
-
Backup the controller data from the Kubernetes PV Using a rescue-pod
-
Backup the settings for the controller (the startup arguments, which docker image, disk space, cpu allocation) from
/var/jenkins_home/jobs/controller-name/config.xml
from the operations center filesystem -
If it’s a Teams controller, backup the
/var/jenkins_home/jobs/Teams/jobs/team-name/teamSecurity.xml
from the operations center filesystem -
If it’s a Managed controller, backup the
/var/jenkins_home/jobs/controller-name/nectar-rbac.xml
from the operations center filesystem -
Ensure that the
reclaim policy
of the Persistent Volume for your controller is set toRetain
, and notDelete
. To check this, runkubectl get pv
and look under theRECLAIM POLICY
column for thejenkins-home-$CONTROLLER_NAME-0
claim. If theRECLAIM POLICY
isDelete
, change it toReclaim
by following Changing the reclaim policy of a PersistentVolume -
Delete the controller
-
Create a new controller with the same settings as before (with the correct namespace field)
-
Restore the data in the Kubernetes PV. This step can be skipped if you successfully set the
reclaim policy
toRetain
in the previous step, all the data will still be in the PV, and since you chose the same name for the controller, the same PV will be used. If it’s missing, restore it by Using a rescue-pod. -
If it’s a Teams controller, restore the
/var/jenkins_home/jobs/Teams/jobs/team-name/teamSecurity.xml
from the operations center filesystem -
If it’s a Managed controller, restore the
/var/jenkins_home/jobs/controller-name/nectar-rbac.xml
from the operations center filesystem