[magnum] failed to launch Kubernetes cluster

Tony Pearce tonyppe at gmail.com
Mon Jul 13 08:41:44 UTC 2020


Hi Bharat,
Thank you again :)

Tony Pearce


On Mon, 13 Jul 2020 at 16:35, Bharat Kunwar <bharat at stackhpc.com> wrote:

> Hi Tony
>
> I have not used designate myself so not sure about the exact details but
> if you are using Kayobe/Kolla-Ansible, we recently proposed these backports
> to train,
> https://review.opendev.org/#/c/738882/1/ansible/roles/magnum/templates/magnum.conf.j2.
> Magnum queries Keystone catalog for the url instances can use to talk back
> with Keystone and Magnum itself. Usually this is the public URL but
> essentially you need to specify an endpoint name which fits the bill.
> Please check /etc/kolla/magnum-conductor/magnum.conf in your control plane
> where Magnum is deployed and ensure it it configured to the correct
> interface.
>
>
> Cheers
>
> Bharat
>
> On 13 Jul 2020, at 08:43, Tony Pearce <tonyppe at gmail.com> wrote:
>
> Hi Bharat, many thanks for your super quick response to me last week. I
> really appreciate that, especially since I had been trying for so long on
> this issue here. I wanted to try out your suggestion before coming back and
> creating a reply.
>
> I tried your suggestion and at first, I got the same experience (failure)
> when creating a cluster. It appeared to stop in the same place as I
> described in the mail previous. I noticed some weird things with DNS
> integration (Designate) during the investigation [1] and [2]. I decided to
> remove Designate from Openstack and retest and now I am successfully able
> to deploy a kubernetes cluster! :)
>
> Regarding those 2 points:
> [1] - the configured designate zone was project.cloud.company.com and
> instance1 would be instance1.project.cloud.company.com however, the kube
> master instance hostname was getting master.cloud.company.com
> [2] - when doing a dns lookup on master.project.cloud.company.com the
> private IP was being returned instead of the floating IP. This meant that
> from outside the project, the instance couldnt be pinged by hostname.
>
> I've removed both magnum and Designate and then redeployed both by first
> deploying Magnum and testing successful kubernetes cluster deployment using
> your fix Bharat. Then I deployed Designate again. Issue [1] is still
> present while issue [2] is resolved and no longer present. Kubernetes
> cluster deployment is still successful :)
>
> Thank you once again and have a great week ahead!
>
> Kind regards,
>
> Tony Pearce
>
>
>
> On Fri, 10 Jul 2020 at 16:24, Bharat Kunwar <bharat at stackhpc.com> wrote:
>
>> Hi Tony
>>
>> That is a known issue and is due to the default version of heat container
>> agent baked into Train release. Please use label
>> heat_container_agent_tag=train-stable-3 and you should be good to go.
>>
>> Cheers
>>
>> Bharat
>>
>> On 10 Jul 2020, at 09:18, Tony Pearce <tonyppe at gmail.com> wrote:
>>
>> Hi team, I hope you are all keeping safe and well at the moment.
>>
>> I am trying to use magnum to launch a kubernetes cluster. I have tried
>> different images but currently using Fedora-Atomic 27. The cluster
>> deployment from the cluster template is failing and I am here to ask if you
>> could please point me in the right direction? I have become stuck and I am
>> uncertain how to further troubleshoot this. The cluster seems to fail a few
>> minutes after booting up the master node because after I see the logs
>> ([1],[2]), I do not see any progress in terms of new (different) logs or
>> load on the master. Then the 60-minute timeout is reached and fails the
>> cluster.
>>
>> I deployed this openstack stack using kayobe (kolla-ansible) and this is
>> version Train. This is deployed on CentOS 7 within docker containers.
>> Kayobe manages this deployment through the ansible playbooks.
>>
>> This was previously working some months back although I think I may have
>> used coreos image at that time, and that is also not working today. The
>> deployment would have been back around February 2020. I then deleted that
>> deployment and re-deployed. The only change being the hostname for
>> controller node as updated in the inventory file for the kayobe.
>> Since then which was a month or so back I've been unable to successfully
>> deploy a kubernetes cluster. I've tried other fedora-atomic images as well
>> as coreos without success. When using the coreos image and when tagging the
>> image with the coreos tag as per the magnum docs, the instance fails to
>> boot and goes to the rescue shell. However if I manually launch the coreos
>> image then it does successfully boot and get configured via cloud-init. All
>> of the deployment attempts stop at the same place when using fedora image
>> and I have a different experience if I disable TLS:
>>
>> TLS enabled: master launched, no nodes. Fails when
>> running /usr/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml
>>
>> TLS disabled: master and nodes launched but later fails. I
>> didnt investigate this very much.
>>
>> When looking for help around the web, I found this which looks to be the
>> same issue that I have at the moment (although he's deployed slightly
>> differently, using centos8 and mentions magnum 10):
>>
>> https://ask.openstack.org/en/question/128391/magnum-ussuri-container-not-booting-up/
>>
>>
>> I have the same log messages on the master node within heat.
>>
>> When going through the troubleshooting guide I see that etcd is running
>> and no errors however I dont see any flannel service at all. But I also
>> don't know if this has simply failed before getting to deploy flannel or
>> whether flannel is the reason. I did try to deploy using a cluster template
>> that is using calico as a test but the same result from the logs.
>>
>> When looking at the stack via cli to see the failed stacks this is what I
>> see there: http://paste.openstack.org/show/795736/
>>
>> I'm using master node flavour with 4cpu and 4GB memory. Node with 2cpu
>> and 2GB memory.
>> Storage is only via cinder as I am using iscsi storage with a cinder
>> driver. I dont have any other storage.
>>
>> On the master, after the failure the heat log repeats these logs:
>>
>> ++ curl --silent http://127.0.0.1:8080/healthz
>> + '[' ok = ok ']'
>> + kubectl patch node k8s-cluster-onvaoh2zxotf-master-0 --patch
>> '{"metadata": {"labels": {"node-role.kubernetes.io/master": ""}}}'
>> error: no configuration has been provided, try setting KUBERNETES_MASTER
>> environment variable
>> Trying to label master node with node-role.kubernetes.io/master=""
>> + echo 'Trying to label master node with node-role.kubernetes.io/master=
>> ""'
>> + sleep 5s
>>
>> [1]Here's the cloud-init.log: http://paste.openstack.org/show/795737/
>> [2]and cloud-init-output.log: http://paste.openstack.org/show/795738/
>>
>> May I ask if anyone has a recent deployment of Magnum and a working
>> deployment of kubernetes that could share with me the relevant details like
>> the image you have used so that I can try and replicate?
>>
>> To create the cluster template I have been using:
>> openstack coe cluster template create k8s-cluster-template \
>>                            --image Fedora-Atomic-27 \
>>                            --keypair testpair \
>>                            --external-network physnet2vlan20 \
>>                            --dns-nameserver 192.168.7.233 \
>>                            --flavor 2GB-2vCPU \
>>                            --docker-volume-size 15 \
>>                            --network-driver flannel \
>>                            --coe kubernetes
>>
>>
>> If I have missed anything, I am happy to provide it.
>>
>> Many thanks in advance for any help or pointers on this.
>>
>> Regards,
>>
>> Tony Pearce
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200713/038d60f8/attachment-0001.html>


More information about the openstack-discuss mailing list