[magnum-cluster-api] CREATE_IN_PROGRESS never ending
Hi all, I created a k8s cluster using magnum-cluster-api in openstack bobcat , I used kind for the capi (kind on top of lxc), magnum has been successfully installed along with the cluster-api, but when I create a k8s cluster it is always in the CREATE_IN_PROGRESS status and never completes, if you look at the loadbalancer and the master/worker server has been successfully provisioned, if in the show cluster it looks: status_reason | CAPI Cluster status: Provisioned: Cluster kube-r6jnx is Provisioned. CAPI OpenstackCluster status reason: And when I check the capi-system capi-controller-manager, it always say : http: TLS handshake error from 10.244.0.1:9005: EOF full log https://sprunge.us/DgnUOE Regards, Pahrial
Hi Pahrial, In this kind of scenario, I would use the Cluster API command to get the kubeconfig of the cluster ``` clusterctl -n magnum-system get kubeconfig kube-r6jnx ``` Store the output of that and try doing a `kubectl get nodes` or `kubectl get pods -A` and see if the API server is up and if there is any issues within the cluster (if certain pods are not up or not ready). Thanks Mohammed From: pahrialtkj@gmail.com <pahrialtkj@gmail.com> Date: Thursday, May 2, 2024 at 11:06 AM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: [magnum-cluster-api] CREATE_IN_PROGRESS never ending Hi all, I created a k8s cluster using magnum-cluster-api in openstack bobcat , I used kind for the capi (kind on top of lxc), magnum has been successfully installed along with the cluster-api, but when I create a k8s cluster it is always in the CREATE_IN_PROGRESS status and never completes, if you look at the loadbalancer and the master/worker server has been successfully provisioned, if in the show cluster it looks: status_reason | CAPI Cluster status: Provisioned: Cluster kube-r6jnx is Provisioned. CAPI OpenstackCluster status reason: And when I check the capi-system capi-controller-manager, it always say : http: TLS handshake error from 10.244.0.1:9005: EOF full log https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsprunge.us%2FDgnUOE&data=05%7C02%7Cmnaser%40vexxhost.com%7C46a1a6e26f224d0d3dc008dc6ab95803%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638502592075725406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C80000%7C%7C%7C&sdata=m2DAnY9BHmx%2BalP2nltvbROeVpjEaLbgXty1jEGSYSc%3D&reserved=0<https://sprunge.us/DgnUOE> Regards, Pahrial
Hi Mohammed, It's just show me timeout : E0502 16:16:56.490935 3451067 memcache.go:265] couldn't get current server API group list: Get "https://ip-public:6443/api?timeout=32s": dial tcp ip-public:6443: i/o timeout Thanks, Regards, Pahrial
Hi Pahrial, That would seem like you have an issue with the API server not going up, can you share the cluster template details and the command you used to create the cluster with? Also, you should be able to SSH to the server if you created it with a key and look into the kubeadm/kubelet logs as well. Thanks Mohammed From: pahrialtkj@gmail.com <pahrialtkj@gmail.com> Date: Thursday, May 2, 2024 at 12:25 PM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: [magnum-cluster-api] CREATE_IN_PROGRESS never ending Hi Mohammed, It's just show me timeout : E0502 16:16:56.490935 3451067 memcache.go:265] couldn't get current server API group list: Get "https://ip-public:6443/api?timeout=32s": dial tcp ip-public:6443: i/o timeout Thanks, Regards, Pahrial
Hi Mohammed, This is the command I used to create the template and also create the cluster: openstack coe cluster template create \ --image ubuntu-2204-kube-v1.27.8 \ --external-network public\ --dns-nameserver 8.8.8.8 \ --master-lb-enabled \ --master-flavor m1-kubernetes \ --flavor m1-kubernetes \ --network-driver calico \ --docker-storage-driver overlay2 \ --coe kubernetes \ --label kube_tag=v1.27.8 \ k8s-v1.27.8 openstack coe cluster create k8s-v1.27.8 --keypair sysadmin-key \ --cluster-template k8s-v1.27.8 \ --master-count 1 --node-count 1 Thanks, Regards, Pahrial
Hi Mohammed, I went into the k8s instances, in the journal I found several errors related to image retrieval : full logs : https://sprunge.us/0gYVQ9 Error syncing pod, skipping" err="failed to \"StartContainer\" for \"openstack-cloud-controller-manager\" with CrashLoopBackOff: \"back-off 20s restarting failed container=openstack-cloud-controller-manager pod=openstack-cloud-controller-manager-dtfv7_kube-system(65348dc6-5858-4e1d-9ddb-eb134367eb12)\"" pod="kube-system/openstack-cloud-controller-manager-dtfv7" podUID=65348dc6-5858-4e1d-9ddb-eb134367eb12 Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cinder-csi-plugin\" with CrashLoopBackOff: \"back-off 10s restarting failed container=cinder-csi-plugin pod=csi-cinder-nodeplugin-55454_kube-system(3b8ffea4-4a68-4d4c-bd01-fecccbce0ef8)\"" pod="kube-system/csi-cinder-nodeplugin-55454" podUID=3b8ffea4-4a68-4d4c-bd01-fecccbce0ef8 Thanks, Regards, Pahrial
Hi there, Does your provider network have the ability to talk to your OpenStack public / VIP? Thanks, Mohammed From: pahrialtkj@gmail.com <pahrialtkj@gmail.com> Date: Thursday, May 2, 2024 at 8:39 PM To: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: [magnum-cluster-api] CREATE_IN_PROGRESS never ending Hi Mohammed, I went into the k8s instances, in the journal I found several errors related to image retrieval : full logs : https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsprunge.us%2F0gYVQ9&data=05%7C02%7Cmnaser%40vexxhost.com%7Cb64cc451525b4c36303908dc6b09845d%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638502935816308762%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C80000%7C%7C%7C&sdata=Ne96JJgApF1Y1oa3qGpjZqSl494G0ZnbseT1%2F5X%2Bfp0%3D&reserved=0<https://sprunge.us/0gYVQ9> Error syncing pod, skipping" err="failed to \"StartContainer\" for \"openstack-cloud-controller-manager\" with CrashLoopBackOff: \"back-off 20s restarting failed container=openstack-cloud-controller-manager pod=openstack-cloud-controller-manager-dtfv7_kube-system(65348dc6-5858-4e1d-9ddb-eb134367eb12)\"" pod="kube-system/openstack-cloud-controller-manager-dtfv7" podUID=65348dc6-5858-4e1d-9ddb-eb134367eb12 Error syncing pod, skipping" err="failed to \"StartContainer\" for \"cinder-csi-plugin\" with CrashLoopBackOff: \"back-off 10s restarting failed container=cinder-csi-plugin pod=csi-cinder-nodeplugin-55454_kube-system(3b8ffea4-4a68-4d4c-bd01-fecccbce0ef8)\"" pod="kube-system/csi-cinder-nodeplugin-55454" podUID=3b8ffea4-4a68-4d4c-bd01-fecccbce0ef8 Thanks, Regards, Pahrial
participants (2)
-
Mohammed Naser
-
pahrialtkj@gmail.com