[magnum] failed to launch Kubernetes cluster

Bharat Kunwar bharat at stackhpc.com
Fri Jul 10 08:24:34 UTC 2020


Hi Tony

That is a known issue and is due to the default version of heat container agent baked into Train release. Please use label heat_container_agent_tag=train-stable-3 and you should be good to go.

Cheers

Bharat

> On 10 Jul 2020, at 09:18, Tony Pearce <tonyppe at gmail.com> wrote:
> 
> Hi team, I hope you are all keeping safe and well at the moment. 
> 
> I am trying to use magnum to launch a kubernetes cluster. I have tried different images but currently using Fedora-Atomic 27. The cluster deployment from the cluster template is failing and I am here to ask if you could please point me in the right direction? I have become stuck and I am uncertain how to further troubleshoot this. The cluster seems to fail a few minutes after booting up the master node because after I see the logs ([1],[2]), I do not see any progress in terms of new (different) logs or load on the master. Then the 60-minute timeout is reached and fails the cluster. 
> 
> I deployed this openstack stack using kayobe (kolla-ansible) and this is version Train. This is deployed on CentOS 7 within docker containers. Kayobe manages this deployment through the ansible playbooks.
> 
> This was previously working some months back although I think I may have used coreos image at that time, and that is also not working today. The deployment would have been back around February 2020. I then deleted that deployment and re-deployed. The only change being the hostname for controller node as updated in the inventory file for the kayobe.
> Since then which was a month or so back I've been unable to successfully deploy a kubernetes cluster. I've tried other fedora-atomic images as well as coreos without success. When using the coreos image and when tagging the image with the coreos tag as per the magnum docs, the instance fails to boot and goes to the rescue shell. However if I manually launch the coreos image then it does successfully boot and get configured via cloud-init. All of the deployment attempts stop at the same place when using fedora image and I have a different experience if I disable TLS: 
> 
> TLS enabled: master launched, no nodes. Fails when running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
> 
> TLS disabled: master and nodes launched but later fails. I didnt investigate this very much. 
> 
> When looking for help around the web, I found this which looks to be the same issue that I have at the moment (although he's deployed slightly differently, using centos8 and mentions magnum 10):  
> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/ <https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/> 
> 
> I have the same log messages on the master node within heat. 
> 
> When going through the troubleshooting guide I see that etcd is running and no errors however I dont see any flannel service at all. But I also don't know if this has simply failed before getting to deploy flannel or whether flannel is the reason. I did try to deploy using a cluster template that is using calico as a test but the same result from the logs.
> 
> When looking at the stack via cli to see the failed stacks this is what I see there: http://paste.openstack.org/show/795736/ <http://paste.openstack.org/show/795736/>
> 
> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and 2GB memory. 
> Storage is only via cinder as I am using iscsi storage with a cinder driver. I dont have any other storage. 
> 
> On the master, after the failure the heat log repeats these logs: 
> 
> ++ curl --silent http://127.0.0.1:8080/healthz <http://127.0.0.1:8080/healthz>
> + '[' ok = ok ']'
> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch '{"metadata": {"labels": {"node-role.kubernetes.io/master <http://node-role.kubernetes.io/master>": ""}}}'
> error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
> Trying to label master node with node-role.kubernetes.io/master= <http://node-role.kubernetes.io/master=>""
> + echo 'Trying to label master node with node-role.kubernetes.io/master= <http://node-role.kubernetes.io/master=>""'
> + sleep 5s
> 
> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/ <http://paste.openstack.org/show/795737/> 
> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/ <http://paste.openstack.org/show/795738/>
> 
> May I ask if anyone has a recent deployment of Magnum and a working deployment of kubernetes that could share with me the relevant details like the image you have used so that I can try and replicate? 
> 
> To create the cluster template I have been using: 
> openstack coe cluster template create k8s-cluster-template \
>                            --image Fedora-Atomic-27 \
>                            --keypair testpair \
>                            --external-network physnet2vlan20 \
>                            --dns-nameserver 192.168.7.233 \
>                            --flavor 2GB-2vCPU \
>                            --docker-volume-size 15 \
>                            --network-driver flannel \
>                            --coe kubernetes
> 
> 
> If I have missed anything, I am happy to provide it. 
> 
> Many thanks in advance for any help or pointers on this. 
> 
> Regards,
> 
> Tony Pearce
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200710/004aee53/attachment-0001.html>


More information about the openstack-discuss mailing list