It's even easier, I just removed the default policy file because this message was misleading: 2024-01-16 14:37:56.490 1815 ERROR magnum.drivers.heat.k8s_fedora_template_def [req-63f969ff-de38-4d3e-af58-ed1d7a250a03 - - - - -] Failed to load default keystone auth policy: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) I thought that file is required, but I started a cluster without it and it creates a policy automatically: [root@kubernetes-cluster24-ywkc44hxlbjj-master-0 ~]# /var/srv/magnum/bin/kubectl -n kube-system describe configmap k8s-keystone-auth-policy --kubeconfig /etc/kubernetes/admin.conf Name: k8s-keystone-auth-policy Namespace: kube-system Labels: <none> Annotations: Data ==== policies: ---- [{"match": [{"type": "role", "values": ["member"]}, {"type": "project", "values": ["d0f2f639692245f5928150950604a748"]}], "resource": {"namespace": "default", "resources": ["pods", "services", "deployments", "pvc"], "verbs": ["list"], "version": "*"}}] The cluster now starts successfully without any manual intervention. :-) Zitat von Eugen Block <eblock@nde.ag>:
I found the reason for the failing k8s-keystone-auth pod. I had a faulty /etc/magnum/keystone_auth_default_policy.yaml file which was injected into the cluster. I was able to get the pod up after changing the respective configmap.
Zitat von Eugen Block <eblock@nde.ag>:
I managed to get the other two pods to start by removing the taint node-role.kubernetes.io/master:NoSchedule-. But the keystone-auth pod still fails to start. I also changed the container image since it failed to pull docker.io/k8scloudprovider/k8s-keystone-auth:v1.18.0. I tried docker.io/k8scloudprovider/k8s-keystone-auth:v1.26.2 which seems to be pulled successfully but I don't see why it fails to start.
Zitat von Eugen Block <eblock@nde.ag>:
Hi there,
I've noticed several magnum related threads and thought I'd give it a try while I keep digging. I've only recently started to work with magnum, and this is how far I get. Note that this openstack is still running on version Victoria, I'm working towards migrating it to a newer release. My coe cluster template currently looks like this (I'll omit the defaults):
| labels | {'kube_tag': 'v1.18.2', 'hyperkube_prefix': 'k8s.gcr.io/'} | | https_proxy | http://<IP>:<PORT> | | http_proxy | http://<IP>:<PORT> | | image_id | fedora-coreos33 | | dns_nameserver | <MY_DNS> |
I've already tried quite a lot of different kube_tags and hyperkube_prefixes, this combination gets me at least to have some running containers:
[root@kubernetes-cluster20-buuxqog6cfnn-master-0 ~]# podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7fbe3f8f308f docker.io/openstackmagnum/heat-container-agent:victoria-stable-1 /usr/bin/start-he... 42 minutes ago Up 42 minutes ago heat-container-agent a8b6ae357034 k8s.gcr.io/hyperkube:v1.18.2 kube-apiserver --... 40 minutes ago Up 40 minutes ago kube-apiserver cb7774fbf849 k8s.gcr.io/hyperkube:v1.18.2 kube-controller-m... 40 minutes ago Up 40 minutes ago kube-controller-manager 448f1c02cbd4 k8s.gcr.io/hyperkube:v1.18.2 kube-scheduler --... 40 minutes ago Up 40 minutes ago kube-scheduler 4be28a0a2271 k8s.gcr.io/hyperkube:v1.18.2 kubelet --logtost... 40 minutes ago Up 40 minutes ago kubelet 542ffdb081e7 k8s.gcr.io/hyperkube:v1.18.2 kube-proxy --logt... 40 minutes ago Up 40 minutes ago kube-proxy 61874a4e4595 quay.io/coreos/etcd:v3.4.6 /usr/local/bin/et... 40 minutes ago Up 40 minutes ago etcd
But not all pods are starting:
[root@kubernetes-cluster20-buuxqog6cfnn-master-0 ~]# /var/srv/magnum/bin/kubectl get pods -o wide --kubeconfig=/etc/kubernetes/admin.conf -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-786ffb7797-dpkbx 1/1 Running 0 25m 10.100.0.3 kubernetes-cluster20-buuxqog6cfnn-master-0 <none> <none> kube-system coredns-786ffb7797-mslfk 1/1 Running 0 25m 10.100.0.2 kubernetes-cluster20-buuxqog6cfnn-master-0 <none> <none> kube-system dashboard-metrics-scraper-6b4884c9d5-nrvbg 1/1 Running 0 25m 10.100.0.5 kubernetes-cluster20-buuxqog6cfnn-master-0 <none> <none> kube-system k8s-keystone-auth-qt7n5 0/1 CrashLoopBackOff 4 5m12s 10.0.0.116 kubernetes-cluster20-buuxqog6cfnn-master-0 <none> <none> kube-system kube-dns-autoscaler-75859754fd-69zxx 0/1 Pending 0 25m <none> <none> <none> <none> kube-system kube-flannel-ds-cn5jf 1/1 Running 0 25m 10.0.0.116 kubernetes-cluster20-buuxqog6cfnn-master-0 <none> <none> kube-system kubernetes-dashboard-c98496485-qw4hl 1/1 Running 0 25m 10.100.0.4 kubernetes-cluster20-buuxqog6cfnn-master-0 <none> <none> kube-system magnum-metrics-server-79556d6999-hkqcw 0/1 Pending 0 25m <none> <none> <none> <none>
I didn't look too deep into the two other failing pods, I think the keystone pod is more important at this point or maybe even is required for the other pods to start, I don't know yet. I can pull the image manually and the "describe pod" command also shows that the container was pulled successfully, but I can't find other hints what is actually failing. Here is the full output of "describe pod":
---snip--- [root@kubernetes-cluster20-buuxqog6cfnn-master-0 ~]# /var/srv/magnum/bin/kubectl --kubeconfig=/etc/kubernetes/admin.conf -n kube-system describe pod/k8s-keystone-auth-qt7n5 Name: k8s-keystone-auth-qt7n5 Namespace: kube-system Priority: 0 Node: kubernetes-cluster20-buuxqog6cfnn-master-0/10.0.0.116 Start Time: Mon, 15 Jan 2024 09:13:16 +0000 Labels: controller-revision-hash=6875f96b46 k8s-app=k8s-keystone-auth pod-template-generation=1 Annotations: kubernetes.io/psp: magnum.privileged Status: Running IP: 10.0.0.116 IPs: IP: 10.0.0.116 Controlled By: DaemonSet/k8s-keystone-auth Containers: k8s-keystone-auth: Container ID: docker://adca0b9a9eb011f22530886a42bf862b4cfa7e6b7965cd77d6dac4f37222b827 Image: docker.io/k8scloudprovider/k8s-keystone-auth:v1.18.0 Image ID: docker-pullable://k8scloudprovider/k8s-keystone-auth@sha256:ab72a7e5b9eca9af2762796690be800bce2de25dee8bd5ad55d0996b29c85146 Port: 8443/TCP Host Port: 8443/TCP Args: ./bin/k8s-keystone-auth --tls-cert-file /etc/kubernetes/certs/server.crt --tls-private-key-file /etc/kubernetes/certs/server.key --policy-configmap-name k8s-keystone-auth-policy --keystone-url http://<CONTROLLER>:5000/v3 --sync-configmap-name keystone-sync-policy --keystone-ca-file /etc/kubernetes/ca-bundle.crt --listen 127.0.0.1:8443 State: Terminated Reason: Error Exit Code: 1 Started: Mon, 15 Jan 2024 09:23:52 +0000 Finished: Mon, 15 Jan 2024 09:23:52 +0000 Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 15 Jan 2024 09:18:41 +0000 Finished: Mon, 15 Jan 2024 09:18:41 +0000 Ready: False Restart Count: 6 Requests: cpu: 200m Environment: <none> Mounts: /etc/kubernetes from ca-certs (ro) /etc/kubernetes/certs from k8s-certs (ro) /var/run/secrets/kubernetes.io/serviceaccount from k8s-keystone-auth-token-vhm4r (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: k8s-certs: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/certs HostPathType: DirectoryOrCreate ca-certs: Type: HostPath (bare host directory volume) Path: /etc/kubernetes HostPathType: DirectoryOrCreate k8s-keystone-auth-token-vhm4r: Type: Secret (a volume populated by a Secret) SecretName: k8s-keystone-auth-token-vhm4r Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: :NoSchedule :NoExecute CriticalAddonsOnly node.kubernetes.io/disk-pressure:NoSchedule node.kubernetes.io/memory-pressure:NoSchedule node.kubernetes.io/network-unavailable:NoSchedule node.kubernetes.io/not-ready:NoExecute node.kubernetes.io/pid-pressure:NoSchedule node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unschedulable:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/k8s-keystone-auth-qt7n5 to kubernetes-cluster20-buuxqog6cfnn-master-0 Warning Failed 9m52s kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Error: ErrImagePull Warning Failed 9m52s kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Failed to pull image "docker.io/k8scloudprovider/k8s-keystone-auth:v1.18.0": rpc error: code = Unknown desc = Get "https://registry-1.docker.io/v2/k8scloudprovider/k8s-keystone-auth/manifests...": Internal Server Error Normal Pulling 9m37s (x5 over 10m) kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Pulling image "docker.io/k8scloudprovider/k8s-keystone-auth:v1.18.0" Normal Pulled 9m36s (x4 over 10m) kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Successfully pulled image "docker.io/k8scloudprovider/k8s-keystone-auth:v1.18.0" Normal Created 9m35s (x4 over 10m) kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Created container k8s-keystone-auth Normal Started 9m35s (x4 over 10m) kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Started container k8s-keystone-auth Warning BackOff 33s (x48 over 10m) kubelet, kubernetes-cluster20-buuxqog6cfnn-master-0 Back-off restarting failed container ---snip---
I'm obviously not allowed to get logs from the pod:
[root@kubernetes-cluster20-buuxqog6cfnn-master-0 ~]# /var/srv/magnum/bin/kubectl --kubeconfig=/etc/kubernetes/admin.conf -n kube-system logs pod/k8s-keystone-auth-qt7n5 Error from server: Get https://10.0.0.116:10250/containerLogs/kube-system/k8s-keystone-auth-qt7n5/k...: Forbidden
I appreciate any hints!
Thanks! Eugen