[magnum] failed to launch Kubernetes cluster
Tony Pearce
tonyppe at gmail.com
Fri Jul 10 08:18:53 UTC 2020
Hi team, I hope you are all keeping safe and well at the moment.
I am trying to use magnum to launch a kubernetes cluster. I have tried
different images but currently using Fedora-Atomic 27. The cluster
deployment from the cluster template is failing and I am here to ask if you
could please point me in the right direction? I have become stuck and I am
uncertain how to further troubleshoot this. The cluster seems to fail a few
minutes after booting up the master node because after I see the logs
([1],[2]), I do not see any progress in terms of new (different) logs or
load on the master. Then the 60-minute timeout is reached and fails the
cluster.
I deployed this openstack stack using kayobe (kolla-ansible) and this is
version Train. This is deployed on CentOS 7 within docker containers.
Kayobe manages this deployment through the ansible playbooks.
This was previously working some months back although I think I may have
used coreos image at that time, and that is also not working today. The
deployment would have been back around February 2020. I then deleted that
deployment and re-deployed. The only change being the hostname for
controller node as updated in the inventory file for the kayobe.
Since then which was a month or so back I've been unable to successfully
deploy a kubernetes cluster. I've tried other fedora-atomic images as well
as coreos without success. When using the coreos image and when tagging the
image with the coreos tag as per the magnum docs, the instance fails to
boot and goes to the rescue shell. However if I manually launch the coreos
image then it does successfully boot and get configured via cloud-init. All
of the deployment attempts stop at the same place when using fedora image
and I have a different experience if I disable TLS:
TLS enabled: master launched, no nodes. Fails when
running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
TLS disabled: master and nodes launched but later fails. I
didnt investigate this very much.
When looking for help around the web, I found this which looks to be the
same issue that I have at the moment (although he's deployed slightly
differently, using centos8 and mentions magnum 10):
https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/
I have the same log messages on the master node within heat.
When going through the troubleshooting guide I see that etcd is running and
no errors however I dont see any flannel service at all. But I also don't
know if this has simply failed before getting to deploy flannel or whether
flannel is the reason. I did try to deploy using a cluster template that is
using calico as a test but the same result from the logs.
When looking at the stack via cli to see the failed stacks this is what I
see there: http://paste.openstack.org/show/795736/
I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and
2GB memory.
Storage is only via cinder as I am using iscsi storage with a cinder
driver. I dont have any other storage.
On the master, after the failure the heat log repeats these logs:
++ curl --silent http://127.0.0.1:8080/healthz
+ '[' ok = ok ']'
+ kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch
'{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}'
error: no configuration has been provided, try setting KUBERNETES_MASTER
environment variable
Trying to label master node with node-role.kubernetes.io/master=""
+ echo 'Trying to label master node with node-role.kubernetes.io/master=""'
+ sleep 5s
[1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/
[2]and cloud-init-output.log: http://paste.openstack.org/show/795738/
May I ask if anyone has a recent deployment of Magnum and a working
deployment of kubernetes that could share with me the relevant details like
the image you have used so that I can try and replicate?
To create the cluster template I have been using:
openstack coe cluster template create k8s-cluster-template \
--image Fedora-Atomic-27 \
--keypair testpair \
--external-network physnet2vlan20 \
--dns-nameserver 192.168.7.233 \
--flavor 2GB-2vCPU \
--docker-volume-size 15 \
--network-driver flannel \
--coe kubernetes
If I have missed anything, I am happy to provide it.
Many thanks in advance for any help or pointers on this.
Regards,
Tony Pearce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200710/079d8a05/attachment.html>
More information about the openstack-discuss
mailing list