[magnum] failed to launch Kubernetes cluster

Tony Pearce tonyppe at gmail.com
Mon Jul 13 07:43:51 UTC 2020


Hi Bharat, many thanks for your super quick response to me last week. I
really appreciate that, especially since I had been trying for so long on
this issue here. I wanted to try out your suggestion before coming back and
creating a reply.

I tried your suggestion and at first, I got the same experience (failure)
when creating a cluster. It appeared to stop in the same place as I
described in the mail previous. I noticed some weird things with DNS
integration (Designate) during the investigation [1] and [2]. I decided to
remove Designate from Openstack and retest and now I am successfully able
to deploy a kubernetes cluster! :)

Regarding those 2 points:
[1] - the configured designate zone was project.cloud.company.com and
instance1 would be instance1.project.cloud.company.com however, the kube
master instance hostname was getting master.cloud.company.com
[2] - when doing a dns lookup on master.project.cloud.company.com the
private IP was being returned instead of the floating IP. This meant that
from outside the project, the instance couldnt be pinged by hostname.

I've removed both magnum and Designate and then redeployed both by first
deploying Magnum and testing successful kubernetes cluster deployment using
your fix Bharat. Then I deployed Designate again. Issue [1] is still
present while issue [2] is resolved and no longer present. Kubernetes
cluster deployment is still successful :)

Thank you once again and have a great week ahead!

Kind regards,

Tony Pearce



On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar <bharat at stackhpc.com> wrote:

> Hi Tony
>
> That is a known issue and is due to the default version of heat container
> agent baked into Train release. Please use label
> heat_container_agent_tag=train-stable-3 and you should be good to go.
>
> Cheers
>
> Bharat
>
> On 10 Jul 2020, at 09:18, Tony Pearce <tonyppe at gmail.com> wrote:
>
> Hi team, I hope you are all keeping safe and well at the moment.
>
> I am trying to use magnum to launch a kubernetes cluster. I have tried
> different images but currently using Fedora-Atomic 27. The cluster
> deployment from the cluster template is failing and I am here to ask if you
> could please point me in the right direction? I have become stuck and I am
> uncertain how to further troubleshoot this. The cluster seems to fail a few
> minutes after booting up the master node because after I see the logs
> ([1],[2]), I do not see any progress in terms of new (different) logs or
> load on the master. Then the 60-minute timeout is reached and fails the
> cluster.
>
> I deployed this openstack stack using kayobe (kolla-ansible) and this is
> version Train. This is deployed on CentOS 7 within docker containers.
> Kayobe manages this deployment through the ansible playbooks.
>
> This was previously working some months back although I think I may have
> used coreos image at that time, and that is also not working today. The
> deployment would have been back around February 2020. I then deleted that
> deployment and re-deployed. The only change being the hostname for
> controller node as updated in the inventory file for the kayobe.
> Since then which was a month or so back I've been unable to successfully
> deploy a kubernetes cluster. I've tried other fedora-atomic images as well
> as coreos without success. When using the coreos image and when tagging the
> image with the coreos tag as per the magnum docs, the instance fails to
> boot and goes to the rescue shell. However if I manually launch the coreos
> image then it does successfully boot and get configured via cloud-init. All
> of the deployment attempts stop at the same place when using fedora image
> and I have a different experience if I disable TLS:
>
> TLS enabled: master launched, no nodes. Fails when
> running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
>
> TLS disabled: master and nodes launched but later fails. I
> didnt investigate this very much.
>
> When looking for help around the web, I found this which looks to be the
> same issue that I have at the moment (although he's deployed slightly
> differently, using centos8 and mentions magnum 10):
>
> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/
>
>
> I have the same log messages on the master node within heat.
>
> When going through the troubleshooting guide I see that etcd is running
> and no errors however I dont see any flannel service at all. But I also
> don't know if this has simply failed before getting to deploy flannel or
> whether flannel is the reason. I did try to deploy using a cluster template
> that is using calico as a test but the same result from the logs.
>
> When looking at the stack via cli to see the failed stacks this is what I
> see there: http://paste.openstack.org/show/795736/
>
> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu and
> 2GB memory.
> Storage is only via cinder as I am using iscsi storage with a cinder
> driver. I dont have any other storage.
>
> On the master, after the failure the heat log repeats these logs:
>
> ++ curl --silent http://127.0.0.1:8080/healthz
> + '[' ok = ok ']'
> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch
> '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}'
> error: no configuration has been provided, try setting KUBERNETES_MASTER
> environment variable
> Trying to label master node with node-role.kubernetes.io/master=""
> + echo 'Trying to label master node with node-role.kubernetes.io/master=
> ""'
> + sleep 5s
>
> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/
> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/
>
> May I ask if anyone has a recent deployment of Magnum and a working
> deployment of kubernetes that could share with me the relevant details like
> the image you have used so that I can try and replicate?
>
> To create the cluster template I have been using:
> openstack coe cluster template create k8s-cluster-template \
>                            --image Fedora-Atomic-27 \
>                            --keypair testpair \
>                            --external-network physnet2vlan20 \
>                            --dns-nameserver 192.168.7.233 \
>                            --flavor 2GB-2vCPU \
>                            --docker-volume-size 15 \
>                            --network-driver flannel \
>                            --coe kubernetes
>
>
> If I have missed anything, I am happy to provide it.
>
> Many thanks in advance for any help or pointers on this.
>
> Regards,
>
> Tony Pearce
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200713/e1a5d1bc/attachment-0001.html>


More information about the openstack-discuss mailing list