Issue
After applying a Helm upgrade and changing Agents.SeparateNamespace.Enabled from false to true, some controllers fail to utilize the Kubernetes shared cloud defined at the operations center level. This results in an error indicating that the role cjoc-agents cannot be found, and the controller’s service account lacks the permission to list pods in the operations center namespace.
ERROR: Failed to launch ... Also: java.lang.Throwable: waiting here at PluginClassLoader for kubernetes-client-api//io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:175) ... io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: .... Message: pods is forbidden: User "system:serviceaccount:ci:controller-1" cannot list resource "pods" in API group "" in the namespace "ci": RBAC: role.rbac.authorization.k8s.io "cjoc-agents" not found. Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods is forbidden: User "system:serviceaccount:ci:controller-1" cannot list resource "pods" in API group "" in the namespace "ci": RBAC: role.rbac.authorization.k8s.io "cjoc-agents" not found, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}).
In the message above, the namespace in which the operations center is installed is ci, and the controller is named controller-1
|
Environment
-
CloudBees CI on modern cloud platforms - managed controller considering the setup Single cluster and single namespace and with
rbac.install=true.
Explanation
When you apply a Helm upgrade changing from the following values:
... Agents: SeparateNamespace: Enabled: false ... rbac: install: true ...
to the following values:
... Agents: SeparateNamespace: Create: true Enabled: true ... rbac: install: true ...
two primary actions occur simultaneously:
-
all Kubernetes objects necessary for utilizing the new namespace to run agents are created.
... [debug] Created a new Namespace called "ci-builds" in [debug] Created a new ServiceAccount called "jenkins-agents" in ci-builds [debug] Created a new ConfigMap called "jenkins-agent" in cbci-agents [debug] Created a new Role called "cjoc-agents-test-connection" in ci-builds [debug] Created a new Role called "cjoc-agents" in ci-builds [debug] Created a new RoleBinding called "cjoc-builds-role-binding" in ci-builds [debug] Created a new RoleBinding called "cjoc-agents-role-binding" in ci-builds [debug] Created a new RoleBinding called "cjoc-master-role-binding" in ci-builds ...
-
all objects that granted permissions to the operations center’s service account in the previous default namespace are removed.
... [debug] Deleting ServiceAccount "jenkins-agents" in namespace ci... [debug] Deleting ConfigMap "jenkins-agent" in namespace ci... [debug] Deleting Role "cjoc-agents" in namespace ci... [debug] Deleting RoleBinding "cjoc-builds-role-binding" in namespace ci... [debug] Deleting RoleBinding "cjoc-master-role-binding" in namespace ci... ...
As a result of this change, all managed controllers will be provisioned with an additional JVM argument, com.cloudbees.jenkins.plugins.kube.NamespaceFilter.defaultNamespace, to specify that agents should be provisioned in this new namespace. Also, the rolebinding to grant permissions to the controller’s service account will be created in this new namespace.
For controllers managed by CasC and with High Availability (HA) enabled, this issue won’t occur. When CasC is applied, the operations center will detect changes and perform a rolling upgrade. And, with the rolling upgrade, the change to the controller’s deployment to include this JVM argument will be applied, along with the creation of the rolebinding that grants the controller’s service account the necessary permissions to manage agents in the new namespace.
However, for controllers managed by CasC without HA enabled, or for those not managed by CasC, these changes will not be applied, and the issue will appear.
Solution
To resolve this issue, you need to reprovision (or stop and start again) those managed controllers. This action will apply the necessary changes, ensuring the correct permissions and configurations are in place.