[kolla][magnum] Cluster creation failed due to "Waiting for Kubernetes API..."

Zane Bitter zbitter at redhat.com
Wed May 29 20:35:31 UTC 2019


On 20/02/19 2:15 PM, Mark Goddard wrote:
> Hi, I think we've hit this, and John Garbutt has added the following 
> configuration for Kolla Ansible in /etc/kolla/config/heat.conf:
> 
> 	[DEFAULT]
> 	region_name_for_services=RegionOne
> 
> 
> We'll need a patch in kolla ansible to do that without custom config 
> changes.
> Mark
> 
> On Wed, 20 Feb 2019 at 11:05, Bharat Kunwar <bharat at stackhpc.com 
> <mailto:bharat at stackhpc.com>> wrote:
> 
>     Hi Giuseppe,
> 
>     What version of heat are you running?
> 
>     Can you check if you have this patch merged?
>     https://review.openstack.org/579485
> 
>     https://review.openstack.org/579485

This patch caused a regression (in combination with corresponding 
patches to os-collect-config and heat-agents) due to weird things that 
happen in os-apply-config 
(https://bugs.launchpad.net/os-apply-config/+bug/1830967).

Details are here: https://storyboard.openstack.org/#!/story/2005797

I've proposed a fix, and once that merges the workaround suggested above 
will no longer be needed. (Although setting the region name explicitly 
is a Good Thing to do anyway.)

cheers,
Zane.

>     Bharat
> 
>     Sent from my iPhone
> 
>     On 20 Feb 2019, at 10:38, Giuseppe Sannino
>     <km.giuseppesannino at gmail.com <mailto:km.giuseppesannino at gmail.com>>
>     wrote:
> 
>>     Hi Feilong, Bharat,
>>     thanks for your answer.
>>
>>     @Feilong,
>>     From /etc/kolla/heat-engine/heat.conf I see:
>>     [clients_keystone]
>>     auth_uri = http://10.1.7.201:5000
>>
>>     This should map into auth_url within the k8s master.
>>     Within the k8s master in /etc/os-collect-config.conf  I see:
>>
>>     [heat]
>>     auth_url = http://10.1.7.201:5000/v3/
>>     :
>>     :
>>     resource_name = kube-master
>>     region_name = null
>>
>>
>>     and from /etc/sysconfig/heat-params (among the others):
>>     :
>>     REGION_NAME="RegionOne"
>>     :
>>     AUTH_URL="http://10.1.7.201:5000/v3"
>>
>>     This URL corresponds to the "public" Heat endpoint
>>     openstack endpoint list | grep heat
>>     | 3d5f58c43f6b44f6b54990d6fd9ff55d | RegionOne | heat         |
>>     orchestration   | True    | internal  |
>>     http://10.1.7.200:8004/v1/%(tenant_id)s   |
>>     | 8c2492cb0ddc48ca94942a4a299a88dc | RegionOne | heat-cfn     |
>>     cloudformation  | True    | internal  | http://10.1.7.200:8000/v1 
>>                    |
>>     | b164c4618a784da9ae14da75a6c764a3 | RegionOne | heat         |
>>     orchestration   | True    | public    |
>>     http://10.1.7.201:8004/v1/%(tenant_id)s   |
>>     | da203f7d337b4587a0f5fc774c993390 | RegionOne | heat         |
>>     orchestration   | True    | admin     |
>>     http://10.1.7.200:8004/v1/%(tenant_id)s   |
>>     | e0d3743e7c604e5c8aa4684df2d1ce53 | RegionOne | heat-cfn     |
>>     cloudformation  | True    | public    | http://10.1.7.201:8000/v1 
>>                    |
>>     | efe0b8418aa24dfca33c243e7eed7e90 | RegionOne | heat-cfn     |
>>     cloudformation  | True    | admin     | http://10.1.7.200:8000/v1 
>>                    |
>>
>>     Connectivity tests:
>>     [fedora at kube-cluster-fed27-k5di3i7stgks-master-0 ~]$ ping 10.1.7.201
>>     PING 10.1.7.201 (10.1.7.201) 56(84) bytes of data.
>>     64 bytes from 10.1.7.201 <http://10.1.7.201>: icmp_seq=1 ttl=63
>>     time=0.285 ms
>>
>>     [fedora at kube-cluster-fed27-k5di3i7stgks-master-0 ~]$ curl
>>     http://10.1.7.201:5000/v3/
>>     {"version": {"status": "stable", "updated":
>>     "2018-10-15T00:00:00Z", "media-types": [{"base":
>>     "application/json", "type":
>>     "application/vnd.openstack.identity-v3+json"}], "id": "v3.11",
>>     "links": [{"href": "http://10.1.7.201:5000/v3/", "rel": "self"}]}}
>>
>>
>>     Apparently, I can reach such endpoint from within the k8s master
>>
>>
>>     @Bharat,
>>     that file seems to be properly conifugured to me as well.
>>     The problem pointed by "systemctl status heat-container-agent" is
>>     with:
>>
>>     Feb 20 09:33:23 kube-cluster-fed27-k5di3i7stgks-master-0.novalocal
>>     runc[2837]: publicURL endpoint for orchestration service in null
>>     region not found
>>     Feb 20 09:33:23 kube-cluster-fed27-k5di3i7stgks-master-0.novalocal
>>     runc[2837]: Source [heat] Unavailable.
>>     Feb 20 09:33:23 kube-cluster-fed27-k5di3i7stgks-master-0.novalocal
>>     runc[2837]: /var/lib/os-collect-config/local-data not found. Skipping
>>     Feb 20 09:33:53 kube-cluster-fed27-k5di3i7stgks-master-0.novalocal
>>     runc[2837]: publicURL endpoint for orchestration service in null
>>     region not found
>>     Feb 20 09:33:53 kube-cluster-fed27-k5di3i7stgks-master-0.novalocal
>>     runc[2837]: Source [heat] Unavailable.
>>     Feb 20 09:33:53 kube-cluster-fed27-k5di3i7stgks-master-0.novalocal
>>     runc[2837]: /var/lib/os-collect-config/local-data not found. Skipping
>>
>>
>>     Still no way forward from my side.
>>
>>     /Giuseppe
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     On Tue, 19 Feb 2019 at 22:16, Bharat Kunwar <bharat at stackhpc.com
>>     <mailto:bharat at stackhpc.com>> wrote:
>>
>>         I have the same problem. Weird thing is
>>         /etc/sysconfig/heat-params has region_name specified in my case!
>>
>>         Sent from my iPhone
>>
>>         On 19 Feb 2019, at 22:00, Feilong Wang
>>         <feilong at catalyst.net.nz <mailto:feilong at catalyst.net.nz>> wrote:
>>
>>>         Can you talk to the Heat API from your master node?
>>>
>>>
>>>         On 20/02/19 6:43 AM, Giuseppe Sannino wrote:
>>>>         Hi all...again,
>>>>         I managed to get over the previous issue by "not disabling"
>>>>         the TLS in the cluster template.
>>>>         From the cloud-init-output.log I see:
>>>>         Cloud-init v. 17.1 running 'modules:final' at Tue, 19 Feb
>>>>         2019 17:03:53 +0000. Up 38.08 seconds.
>>>>         Cloud-init v. 17.1 finished at Tue, 19 Feb 2019 17:13:22
>>>>         +0000. Datasource DataSourceEc2.  Up 607.13 seconds
>>>>
>>>>         But the cluster creation keeps on failing.
>>>>         From the journalctl -f I see a possible issue:
>>>>         Feb 19 17:42:38
>>>>         kube-cluster-tls-6hezqcq4ien3-master-0.novalocal runc[2723]:
>>>>         publicURL endpoint for orchestration service in null region
>>>>         not found
>>>>         Feb 19 17:42:38
>>>>         kube-cluster-tls-6hezqcq4ien3-master-0.novalocal runc[2723]:
>>>>         Source [heat] Unavailable.
>>>>         Feb 19 17:42:38
>>>>         kube-cluster-tls-6hezqcq4ien3-master-0.novalocal runc[2723]:
>>>>         /var/lib/os-collect-config/local-data not found. Skipping
>>>>
>>>>         anyone familiar with this problem ?
>>>>
>>>>         Thanks as usual.
>>>>         /Giuseppe
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         On Tue, 19 Feb 2019 at 17:35, Giuseppe Sannino
>>>>         <km.giuseppesannino at gmail.com
>>>>         <mailto:km.giuseppesannino at gmail.com>> wrote:
>>>>
>>>>             Hi all,
>>>>             need an help.
>>>>             I deployed an AIO via Kolla on a baremetal node. Here
>>>>             some information about the deployment:
>>>>             ---------------
>>>>             kolla-ansible: 7.0.1
>>>>             openstack_release: Rocky
>>>>             kolla_base_distro: centos
>>>>             kolla_install_type: source
>>>>             TLS: disabled
>>>>             ---------------
>>>>
>>>>
>>>>             VMs spawn without issue but I can't make the "Kubernetes
>>>>             cluster creation" successfully. It fails due to "Time out"
>>>>
>>>>             I managed to log into Kuber Master and from the
>>>>             cloud-init-output.log I can see:
>>>>             + echo 'Waiting for Kubernetes API...'
>>>>             Waiting for Kubernetes API...
>>>>             ++ curl --silent http://127.0.0.1:8080/healthz
>>>>             + '[' ok = '' ']'
>>>>             + sleep 5
>>>>
>>>>
>>>>             Checking via systemctl and journalctl I see:
>>>>             [fedora at kube-clsuter-qamdealetlbi-master-0 log]$
>>>>             systemctl status kube-apiserver
>>>>             ● kube-apiserver.service - kubernetes-apiserver
>>>>                Loaded: loaded
>>>>             (/etc/systemd/system/kube-apiserver.service; enabled;
>>>>             vendor preset: disabled)
>>>>                Active: failed (Result: exit-code) since Tue
>>>>             2019-02-19 15:31:41 UTC; 45min ago
>>>>               Process: 3796 ExecStart=/usr/bin/runc --systemd-cgroup
>>>>             run kube-apiserver (code=exited, status=1/FAILURE)
>>>>              Main PID: 3796 (code=exited, status=1/FAILURE)
>>>>
>>>>             Feb 19 15:31:40
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Main process exited,
>>>>             code=exited, status=1/FAILURE
>>>>             Feb 19 15:31:40
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Failed with result 'exit-code'.
>>>>             Feb 19 15:31:41
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Service RestartSec=100ms
>>>>             expired, scheduling restart.
>>>>             Feb 19 15:31:41
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Scheduled restart job, restart
>>>>             counter is at 6.
>>>>             Feb 19 15:31:41
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             Stopped kubernetes-apiserver.
>>>>             Feb 19 15:31:41
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Start request repeated too quickly.
>>>>             Feb 19 15:31:41
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Failed with result 'exit-code'.
>>>>             Feb 19 15:31:41
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             Failed to start kubernetes-apiserver.
>>>>
>>>>             [fedora at kube-clsuter-qamdealetlbi-master-0 log]$ sudo
>>>>             journalctl -u kube-apiserver
>>>>             -- Logs begin at Tue 2019-02-19 15:21:36 UTC, end at Tue
>>>>             2019-02-19 16:17:00 UTC. --
>>>>             Feb 19 15:31:33
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             Started kubernetes-apiserver.
>>>>             Feb 19 15:31:34
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal runc[2794]:
>>>>             Flag --insecure-bind-address has been deprecated, This
>>>>             flag will be removed in a future version.
>>>>             Feb 19 15:31:34
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal runc[2794]:
>>>>             Flag --insecure-port has been deprecated, This flag will
>>>>             be removed in a future version.
>>>>             Feb 19 15:31:35
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal runc[2794]:
>>>>             Error: error creating self-signed certificates: open
>>>>             /var/run/kubernetes/apiserver.crt: permission denied
>>>>             :
>>>>             :
>>>>             :
>>>>             Feb 19 15:31:35
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal runc[2794]:
>>>>             error: error creating self-signed certificates: open
>>>>             /var/run/kubernetes/apiserver.crt: permission denied
>>>>             Feb 19 15:31:35
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Main process exited,
>>>>             code=exited, status=1/FAILURE
>>>>             Feb 19 15:31:35
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Failed with result 'exit-code'.
>>>>             Feb 19 15:31:35
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Service RestartSec=100ms
>>>>             expired, scheduling restart.
>>>>             Feb 19 15:31:35
>>>>             kube-clsuter-qamdealetlbi-master-0.novalocal systemd[1]:
>>>>             kube-apiserver.service: Scheduled restart job, restart
>>>>             counter is at 1.
>>>>
>>>>
>>>>             May I ask for an help on this ?
>>>>
>>>>             Many thanks
>>>>             /Giuseppe
>>>>
>>>>
>>>>
>>>>
>>>         -- 
>>>         Cheers & Best regards,
>>>         Feilong Wang (王飞龙)
>>>         --------------------------------------------------------------------------
>>>         Senior Cloud Software Engineer
>>>         Tel: +64-48032246
>>>         Email:flwang at catalyst.net.nz  <mailto:flwang at catalyst.net.nz>
>>>         Catalyst IT Limited
>>>         Level 6, Catalyst House, 150 Willis Street, Wellington
>>>         --------------------------------------------------------------------------
>>




More information about the openstack-discuss mailing list