[Magnum] Cluster Create failure
Hi All, I am trying to create a cluster using Magnum using : magnum cluster-create kubernetes-cluster --cluster-template kubernetes-cluster-template --master-count 1 --node-count 1 --keypair magnum_key My Template looks like this: +-----------------------+--------------------------------------+ | Property | Value | +-----------------------+--------------------------------------+ | insecure_registry | - | | labels | {} | | updated_at | - | | floating_ip_enabled | True | | fixed_subnet | - | | master_flavor_id | small | | user_id | b4727cb329c14c388d777d0ce38c8a6b | | uuid | 24d01aea-c968-42e3-bcaa-a2e756aac5c7 | | no_proxy | - | | https_proxy | - | | tls_disabled | False | | keypair_id | - | | hidden | False | | project_id | 4c6bc4445c764249921a0a6e40b192dd | | public | False | | http_proxy | - | | docker_volume_size | 4 | | server_type | vm | | external_network_id | 5guknet | | cluster_distro | fedora-atomic | | image_id | fedora-atomic-latest | | volume_driver | - | | registry_enabled | False | | docker_storage_driver | devicemapper | | apiserver_port | - | | name | kubernetes-cluster-template | | created_at | 2019-03-27T12:21:27+00:00 | | network_driver | flannel | | fixed_network | - | | coe | kubernetes | | flavor_id | small | | master_lb_enabled | False | | dns_nameserver | 8.8.8.8 | +-----------------------+--------------------------------------+ I am getting the following error: {"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "ResourceTypeUnavailable: : resources.kube_masters<nested_st ack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Opt ional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog.", "traceback": null, "type": "StackValidationFailed"}, "title": "Bad Request"} log_http_response /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:157 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server [req-b4037dc5-2fe1-4533-a6e3-433609c6b22d - - - - -] Exception during message handling: InvalidParameterValue: ERROR: ResourceTypeUnavailable: : res ources.kube_masters<nested_stack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Optional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog. 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 80, in cluster_create 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server raise e 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: ResourceTypeUnavailable: : resources.kube_masters<nested_stack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/s ite-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Optional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog. 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server The magnum.conf file is as suggested in https://docs.openstack.org/magnum/pike/install/install-guide-from-source.htm... I DO NOT have cinder in my openstack. I believe, it is optional. Please suggest how can I resolve this issue. Kind Regards, Navdeep
Try this: docker_storage_driver=overlay2 and do not specify docker_volume_size.
On 27 Mar 2019, at 17:39, Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Hi All,
I am trying to create a cluster using Magnum using : magnum cluster-create kubernetes-cluster --cluster-template kubernetes-cluster-template --master-count 1 --node-count 1 --keypair magnum_key
My Template looks like this: +-----------------------+--------------------------------------+ | Property | Value | +-----------------------+--------------------------------------+ | insecure_registry | - | | labels | {} | | updated_at | - | | floating_ip_enabled | True | | fixed_subnet | - | | master_flavor_id | small | | user_id | b4727cb329c14c388d777d0ce38c8a6b | | uuid | 24d01aea-c968-42e3-bcaa-a2e756aac5c7 | | no_proxy | - | | https_proxy | - | | tls_disabled | False | | keypair_id | - | | hidden | False | | project_id | 4c6bc4445c764249921a0a6e40b192dd | | public | False | | http_proxy | - | | docker_volume_size | 4 | | server_type | vm | | external_network_id | 5guknet | | cluster_distro | fedora-atomic | | image_id | fedora-atomic-latest | | volume_driver | - | | registry_enabled | False | | docker_storage_driver | devicemapper | | apiserver_port | - | | name | kubernetes-cluster-template | | created_at | 2019-03-27T12:21:27+00:00 | | network_driver | flannel | | fixed_network | - | | coe | kubernetes | | flavor_id | small | | master_lb_enabled | False | | dns_nameserver | 8.8.8.8 | +-----------------------+--------------------------------------+
I am getting the following error:
{"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "ResourceTypeUnavailable: : resources.kube_masters<nested_st ack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml <file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Opt ional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog.", "traceback": null, "type": "StackValidationFailed"}, "title": "Bad Request"} log_http_response /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:157 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server [req-b4037dc5-2fe1-4533-a6e3-433609c6b22d - - - - -] Exception during message handling: InvalidParameterValue: ERROR: ResourceTypeUnavailable: : res ources.kube_masters<nested_stack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml <file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Optional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog. 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 80, in cluster_create 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server raise e 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: ResourceTypeUnavailable: : resources.kube_masters<nested_stack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/s <file:///var/lib/magnum/env/lib/python2.7/s> ite-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Optional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog. 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server
The magnum.conf file is as suggested in https://docs.openstack.org/magnum/pike/install/install-guide-from-source.htm... <https://docs.openstack.org/magnum/pike/install/install-guide-from-source.html> I DO NOT have cinder in my openstack. I believe, it is optional. Please suggest how can I resolve this issue.
Kind Regards, Navdeep
Hi Bharat, Thank you very much. It worked for me. However, I am getting resource error while starting the cluster. I have enough resources available in the hypervisor, but it errors out: faults | {'0': 'ResourceInError: resources[0].resources.kube-master: Went to status ERROR due to "Message: No valid host was found. , Code: 500"', 'kube_masters': 'ResourceInError: resources.kube_masters.resources[0].resources.kube-master: Went to status ERROR due to "Message: No valid host was found. , Code: 500"', 'kube-master': 'ResourceInError: resources.kube-master: Went to status ERROR due to "Message: No valid host was found. , Code: 500"'} | Nova Scheduler Error: 2019-03-28 11:36:37.686 79522 ERROR nova.scheduler.client.report [req-82c4fb8b-785b-40bf-82fc-ff9d0e6101a0 b4727cb329c14c388d777d0ce38c8a6b 4c6bc4445c764249921a0a6e40b192dd - default default] Failed to retrieve allocation candidates from placement API for filters {'VCPU': 4, 'MEMORY_MB': 4096, 'DISK_GB': 80}. Got 500: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> <hr> <address>Apache/2.4.18 (Ubuntu) Server at pluto Port 8778</address> </body></html> Please help me resolve this error. Kind Regards, Navdeep From: Bharat Kunwar <bharat@stackhpc.com> Sent: 27 March 2019 17:58 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure Try this: docker_storage_driver=overlay2 and do not specify docker_volume_size. On 27 Mar 2019, at 17:39, Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk<mailto:navdeep.uniyal@bristol.ac.uk>> wrote: Hi All, I am trying to create a cluster using Magnum using : magnum cluster-create kubernetes-cluster --cluster-template kubernetes-cluster-template --master-count 1 --node-count 1 --keypair magnum_key My Template looks like this: +-----------------------+--------------------------------------+ | Property | Value | +-----------------------+--------------------------------------+ | insecure_registry | - | | labels | {} | | updated_at | - | | floating_ip_enabled | True | | fixed_subnet | - | | master_flavor_id | small | | user_id | b4727cb329c14c388d777d0ce38c8a6b | | uuid | 24d01aea-c968-42e3-bcaa-a2e756aac5c7 | | no_proxy | - | | https_proxy | - | | tls_disabled | False | | keypair_id | - | | hidden | False | | project_id | 4c6bc4445c764249921a0a6e40b192dd | | public | False | | http_proxy | - | | docker_volume_size | 4 | | server_type | vm | | external_network_id | 5guknet | | cluster_distro | fedora-atomic | | image_id | fedora-atomic-latest | | volume_driver | - | | registry_enabled | False | | docker_storage_driver | devicemapper | | apiserver_port | - | | name | kubernetes-cluster-template | | created_at | 2019-03-27T12:21:27+00:00 | | network_driver | flannel | | fixed_network | - | | coe | kubernetes | | flavor_id | small | | master_lb_enabled | False | | dns_nameserver | 8.8.8.8 | +-----------------------+--------------------------------------+ I am getting the following error: {"explanation": "The server could not comply with the request since it is either malformed or otherwise incorrect.", "code": 400, "error": {"message": "ResourceTypeUnavailable: : resources.kube_masters<nested_st ack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Opt ional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog.", "traceback": null, "type": "StackValidationFailed"}, "title": "Bad Request"} log_http_response /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:157 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server [req-b4037dc5-2fe1-4533-a6e3-433609c6b22d - - - - -] Exception during message handling: InvalidParameterValue: ERROR: ResourceTypeUnavailable: : res ources.kube_masters<nested_stack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Optional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog. 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server File "/var/lib/magnum/env/local/lib/python2.7/site-packages/magnum/conductor/handlers/cluster_conductor.py", line 80, in cluster_create 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server raise e 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server InvalidParameterValue: ERROR: ResourceTypeUnavailable: : resources.kube_masters<nested_stack>.resources.0<file:///var/lib/magnum/env/lib/python2.7/s ite-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubemaster.yaml>: : HEAT-E99001 Service cinder is not available for resource type Magnum::Optional::Cinder::Volume, reason: cinder volumev3 endpoint is not in service catalog. 2019-03-27 17:28:11.174 145885 ERROR oslo_messaging.rpc.server The magnum.conf file is as suggested in https://docs.openstack.org/magnum/pike/install/install-guide-from-source.htm... I DO NOT have cinder in my openstack. I believe, it is optional. Please suggest how can I resolve this issue. Kind Regards, Navdeep
Hi Bharat, Following is the output: $ openstack image show fedora-atomic-latest +------------------+------------------------------------------------------+ | Field | Value | +------------------+------------------------------------------------------+ | checksum | d7ae2346b11f8f8596be5ce1d11a9a62 | | container_format | bare | | created_at | 2019-03-27T10:51:03Z | | disk_format | qcow2 | | file | /v2/images/7e74c270-face-4ead-892a-7029db727a61/file | | id | 7e74c270-face-4ead-892a-7029db727a61 | | min_disk | 0 | | min_ram | 0 | | name | fedora-atomic-latest | | owner | 4c6bc4445c764249921a0a6e40b192dd | | properties | os_distro='fedora-atomic' | | protected | False | | schema | /v2/schemas/image | | size | 702683648 | | status | active | | tags | | | updated_at | 2019-03-27T10:51:06Z | | virtual_size | None | | visibility | shared | +------------------+------------------------------------------------------+ -----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 28 March 2019 12:35 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure Can you do `openstack image show FedoraAtomicImageName` and paste the output here?
Yes, there seems to be some issue with the server creation now. I will check and try resolving that. Thank you Regards, Navdeep -----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 28 March 2019 12:40 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure Can you create a server normally?
your placement service seems to be broken :) On Thu, Mar 28, 2019 at 9:10 AM Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Yes, there seems to be some issue with the server creation now. I will check and try resolving that. Thank you
Regards, Navdeep
-----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 28 March 2019 12:40 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure
Can you create a server normally?
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
Hi Guys, I am able to resolve the issue in nova. (it was a problem with the oslo.db version - Somehow I installed version 4.44 instead of 4.25 for my pike installation) However, moving forward, I started my kube cluster, I could see 2 instances running for Kube-master and kube-minion. But the deployment failed after that with the following error: {"message": "The resource was found at <a href=\"http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_d epth=2\">http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_depth=2</a>;\nyou should be redirected au tomatically.\n\n", "code": "302 Found", "title": "Found"} log_http_response /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:157 2019-03-29 12:05:51.225 157681 DEBUG heatclient.common.http [req-76e55dec-9511-4aad-aa52-af9978b40eed - - - - -] curl -g -i -X GET -H 'User-Agent: python-heatclient' -H 'Content-Type: application/json' -H 'X-Aut h-Url: http://pluto:5000/v3' -H 'Accept: application/json' -H 'X-Auth-Token: {SHA1}f2c32656c7103ad0b89d83ff9f1b6cebc0a6eee7' http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus... nhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_depth=2 log_curl_request /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:144 2019-03-29 12:05:51.379 157681 DEBUG heatclient.common.http [req-76e55dec-9511-4aad-aa52-af9978b40eed - - - - -] HTTP/1.1 200 OK Content-Type: application/json Content-Length: 4035 X-Openstack-Request-Id: req-942ef8fa-1bba-4573-9022-0d4e135772e0 Date: Fri, 29 Mar 2019 12:05:51 GMT Connection: keep-alive {"resources": [{"resource_name": "kube_cluster_deploy", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}], "logical_resource_id": "kube_cluster_deploy", "creation_time": "2019-03-29T10:40:00Z", "resource_status": "CREATE_FAILED", "updated_time": "2019-03-29T10:40:00Z", "required_by": [], "resource_status_reason": "CREATE aborted (Task create from SoftwareDeployment \"kube_cluster_deploy\" Stack \"kubernetes-cluster-eovgkanhoa4x\" [384d8725-bca3-4fa4-a9fd-f18687aab8fb] Timed out)", "physical_resource_id": "8d715a3f-6ec8-4772-ba4b-1056cd4ab7d3", "resource_type": "OS::Heat::SoftwareDeployment"}, {"resource_name": "kube_minions", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "nested"}], "logical_resource_id": "kube_minions", "creation_time": "2019-03-29T10:40:00Z", "resource_status_reason": "CREATE aborted (Task create from ResourceGroup \"kube_minions\" Stack \"kubernetes-cluster-eovgkanhoa4x\" [384d8725-bca3-4fa4-a9fd-f18687aab8fb] Timed out)", "updated_time": "2019-03-29T10:40:00Z", "required_by": [], "resource_status": "CREATE_FAILED", "physical_resource_id": "33700819-0766-4d30-954b-29aace6048cc", "resource_type": "OS::Heat::ResourceGroup"}, {"parent_resource": "kube_minions", "resource_name": "0", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "nested"}], "logical_resource_id": "0", "creation_time": "2019-03-29T10:40:59Z", "resource_status_reason": "resources[0]: Stack CREATE cancelled", "updated_time": "2019-03-29T10:40:59Z", "required_by": [], "resource_status": "CREATE_FAILED", "physical_resource_id": "d1a8214c-c5b0-488c-83d6-f0a9cacbe844", "resource_type": "file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml"}, {"parent_resource": "0", "resource_name": "minion_wait_condition", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}], "logical_resource_id": "minion_wait_condition", "creation_time": "2019-03-29T10:41:01Z", "resource_status": "CREATE_FAILED", "updated_time": "2019-03-29T10:41:01Z", "required_by": [], "resource_status_reason": "CREATE aborted (Task create from HeatWaitCondition \"minion_wait_condition\" Stack \"kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46-0-ftjzf76onzqn\" [d1a8214c-c5b0-488c-83d6-f0a9cacbe844] Timed out)", "physical_resource_id": "", "resource_type": "OS::Heat::WaitCondition"}]} I am not sure how to debug this. Please advise. Kind Regards, Navdeep -----Original Message----- From: Mohammed Naser <mnaser@vexxhost.com> Sent: 28 March 2019 13:27 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: Bharat Kunwar <bharat@stackhpc.com>; openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure your placement service seems to be broken :) On Thu, Mar 28, 2019 at 9:10 AM Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Yes, there seems to be some issue with the server creation now. I will check and try resolving that. Thank you
Regards, Navdeep
-----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 28 March 2019 12:40 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure
Can you create a server normally?
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
Dear All, My Kubernetes Cluster is timing out after 60 mins. Following is the update I am getting in magnum.log: {"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}} I am not sure how to triage this issue as I cannot see any errors in heat.log as well. Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries. Kind Regards, Navdeep -----Original Message----- From: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Sent: 29 March 2019 12:16 To: Mohammed Naser <mnaser@vexxhost.com>; Bharat Kunwar <bharat@stackhpc.com> Cc: openstack@lists.openstack.org Subject: RE: [Magnum] Cluster Create failure Hi Guys, I am able to resolve the issue in nova. (it was a problem with the oslo.db version - Somehow I installed version 4.44 instead of 4.25 for my pike installation) However, moving forward, I started my kube cluster, I could see 2 instances running for Kube-master and kube-minion. But the deployment failed after that with the following error: {"message": "The resource was found at <a href=\"http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_d epth=2\">http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_depth=2</a>;\nyou should be redirected au tomatically.\n\n", "code": "302 Found", "title": "Found"} log_http_response /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:157 2019-03-29 12:05:51.225 157681 DEBUG heatclient.common.http [req-76e55dec-9511-4aad-aa52-af9978b40eed - - - - -] curl -g -i -X GET -H 'User-Agent: python-heatclient' -H 'Content-Type: application/json' -H 'X-Aut h-Url: http://pluto:5000/v3' -H 'Accept: application/json' -H 'X-Auth-Token: {SHA1}f2c32656c7103ad0b89d83ff9f1b6cebc0a6eee7' http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus... nhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_depth=2 log_curl_request /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:144 2019-03-29 12:05:51.379 157681 DEBUG heatclient.common.http [req-76e55dec-9511-4aad-aa52-af9978b40eed - - - - -] HTTP/1.1 200 OK Content-Type: application/json Content-Length: 4035 X-Openstack-Request-Id: req-942ef8fa-1bba-4573-9022-0d4e135772e0 Date: Fri, 29 Mar 2019 12:05:51 GMT Connection: keep-alive {"resources": [{"resource_name": "kube_cluster_deploy", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}], "logical_resource_id": "kube_cluster_deploy", "creation_time": "2019-03-29T10:40:00Z", "resource_status": "CREATE_FAILED", "updated_time": "2019-03-29T10:40:00Z", "required_by": [], "resource_status_reason": "CREATE aborted (Task create from SoftwareDeployment \"kube_cluster_deploy\" Stack \"kubernetes-cluster-eovgkanhoa4x\" [384d8725-bca3-4fa4-a9fd-f18687aab8fb] Timed out)", "physical_resource_id": "8d715a3f-6ec8-4772-ba4b-1056cd4ab7d3", "resource_type": "OS::Heat::SoftwareDeployment"}, {"resource_name": "kube_minions", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "nested"}], "logical_resource_id": "kube_minions", "creation_time": "2019-03-29T10:40:00Z", "resource_status_reason": "CREATE aborted (Task create from ResourceGroup \"kube_minions\" Stack \"kubernetes-cluster-eovgkanhoa4x\" [384d8725-bca3-4fa4-a9fd-f18687aab8fb] Timed out)", "updated_time": "2019-03-29T10:40:00Z", "required_by": [], "resource_status": "CREATE_FAILED", "physical_resource_id": "33700819-0766-4d30-954b-29aace6048cc", "resource_type": "OS::Heat::ResourceGroup"}, {"parent_resource": "kube_minions", "resource_name": "0", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "nested"}], "logical_resource_id": "0", "creation_time": "2019-03-29T10:40:59Z", "resource_status_reason": "resources[0]: Stack CREATE cancelled", "updated_time": "2019-03-29T10:40:59Z", "required_by": [], "resource_status": "CREATE_FAILED", "physical_resource_id": "d1a8214c-c5b0-488c-83d6-f0a9cacbe844", "resource_type": "file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml"}, {"parent_resource": "0", "resource_name": "minion_wait_condition", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "stack"}], "logical_resource_id": "minion_wait_condition", "creation_time": "2019-03-29T10:41:01Z", "resource_status": "CREATE_FAILED", "updated_time": "2019-03-29T10:41:01Z", "required_by": [], "resource_status_reason": "CREATE aborted (Task create from HeatWaitCondition \"minion_wait_condition\" Stack \"kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46-0-ftjzf76onzqn\" [d1a8214c-c5b0-488c-83d6-f0a9cacbe844] Timed out)", "physical_resource_id": "", "resource_type": "OS::Heat::WaitCondition"}]} I am not sure how to debug this. Please advise. Kind Regards, Navdeep -----Original Message----- From: Mohammed Naser <mnaser@vexxhost.com> Sent: 28 March 2019 13:27 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: Bharat Kunwar <bharat@stackhpc.com>; openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure your placement service seems to be broken :) On Thu, Mar 28, 2019 at 9:10 AM Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Yes, there seems to be some issue with the server creation now. I will check and try resolving that. Thank you
Regards, Navdeep
-----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 28 March 2019 12:40 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure
Can you create a server normally?
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
Hi Navdeep, Have you tried logging into the master/worker node and gripping for `fail` inside /var/log/cloud-init.log and /var/log/cloud-init-output.log? Also how did you deploy your OpenStack services? Bharat
On 1 Apr 2019, at 14:54, Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Dear All,
My Kubernetes Cluster is timing out after 60 mins.
Following is the update I am getting in magnum.log:
{"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}}
I am not sure how to triage this issue as I cannot see any errors in heat.log as well. Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions
I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries.
Kind Regards, Navdeep
Hi Bharat, Thank you for your response. I am getting following errors in my worker VM (Master VM has similar errors): [feduser@kubernetes-cluster-wwmvqecjiznb-minion-0 ~]$ less /var/log/cloud-init.log | grep fail 2019-03-29 16:20:37,018 - cc_growpart.py[DEBUG]: '/' SKIPPED: device_part_info(/dev/mapper/atomicos-root) failed: /dev/mapper/atomicos-root not a partition 2019-03-29 16:20:37,219 - main.py[DEBUG]: Ran 14 modules with 0 failures 2019-03-29 16:20:38,450 - main.py[DEBUG]: Ran 7 modules with 0 failures 2019-03-29 16:20:39,501 - main.py[DEBUG]: Ran 16 modules with 0 failures 2019-04-01 13:21:07,978 - util.py[WARNING]: failed stage init-local 2019-04-01 13:21:07,978 - util.py[DEBUG]: failed stage init-local 2019-04-01 13:21:09,250 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [0/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67d6c50>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:09,252 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67eb588>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:10,255 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67eb2b0>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:11,259 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67ebe10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:12,264 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f35f8>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:13,268 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [4/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f3e10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:14,272 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [5/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67fa668>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:16,278 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [7/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f3550>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:18,283 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [9/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67ebe10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:36,442 - cc_growpart.py[DEBUG]: '/' SKIPPED: device_part_info(/dev/mapper/atomicos-root) failed: /dev/mapper/atomicos-root not a partition 2019-04-01 13:21:36,609 - main.py[DEBUG]: Ran 14 modules with 0 failures 2019-04-01 13:21:37,847 - main.py[DEBUG]: Ran 7 modules with 0 failures 2019-04-01 13:24:19,548 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_scripts_user.py'>) failed 2019-04-01 13:24:19,548 - util.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_scripts_user.py'>) failed return self._runners.run(name, functor, args, freq, clear_on_fail) % (len(failed), len(attempted))) RuntimeError: Runparts: 2 failures in 11 attempted commands 2019-04-01 13:24:19,614 - main.py[DEBUG]: Ran 16 modules with 1 failures I cannot see any error in the metadata service logs and I can reach the server from my VM. I used the Openstack (pike) guide to deploy it manually without using any other system. In my setup, I have nova, neutron(Self-Service), Glance, Horizon, Keystone, Heat and Magnum running. Kind Regards, Navdeep -----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 01 April 2019 14:59 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: Mohammed Naser <mnaser@vexxhost.com>; openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure Hi Navdeep, Have you tried logging into the master/worker node and gripping for `fail` inside /var/log/cloud-init.log and /var/log/cloud-init-output.log? Also how did you deploy your OpenStack services? Bharat
On 1 Apr 2019, at 14:54, Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Dear All,
My Kubernetes Cluster is timing out after 60 mins.
Following is the update I am getting in magnum.log:
{"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}}
I am not sure how to triage this issue as I cannot see any errors in heat.log as well. Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions
I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries.
Kind Regards, Navdeep
Hi Bharat, Adding to my previous email, I did some changes and cloud-init seems to be running fine (it was an issue with the image I was using). However, now I am using the fedora-atomic image with 80GB disk, 4vCPUs and 4GB RAM. I am getting 'no space left errors'. [ 213.443636] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.444282] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.postinst: Cannot open: No such file or directory [ 213.444696] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.445345] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.postrm: Cannot open: No such file or directory [ 213.445811] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.499407] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.shlibs: Cannot open: No such file or directory [ 213.499788] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.500409] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.symbols: Cannot open: No such file or directory [ 213.500822] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.501247] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.templates: Cannot open: No such file or directory [ 213.501644] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.502302] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.triggers: Cannot open: No such file or directory [ 213.502784] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.503401] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.list: Cannot open: No such file or directory [ 213.503833] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.504258] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.md5sums: Cannot open: No such file or directory [ 213.504660] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.505286] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.shlibs: Cannot open: No such file or directory [ 213.505692] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device [ 213.506358] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.symbols: Cannot open: No such file or directory [ 213.506831] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device -----Original Message----- From: Navdeep Uniyal Sent: 01 April 2019 15:19 To: 'Bharat Kunwar' <bharat@stackhpc.com> Cc: Mohammed Naser <mnaser@vexxhost.com>; openstack@lists.openstack.org Subject: RE: [Magnum] Cluster Create failure Hi Bharat, Thank you for your response. I am getting following errors in my worker VM (Master VM has similar errors): [feduser@kubernetes-cluster-wwmvqecjiznb-minion-0 ~]$ less /var/log/cloud-init.log | grep fail 2019-03-29 16:20:37,018 - cc_growpart.py[DEBUG]: '/' SKIPPED: device_part_info(/dev/mapper/atomicos-root) failed: /dev/mapper/atomicos-root not a partition 2019-03-29 16:20:37,219 - main.py[DEBUG]: Ran 14 modules with 0 failures 2019-03-29 16:20:38,450 - main.py[DEBUG]: Ran 7 modules with 0 failures 2019-03-29 16:20:39,501 - main.py[DEBUG]: Ran 16 modules with 0 failures 2019-04-01 13:21:07,978 - util.py[WARNING]: failed stage init-local 2019-04-01 13:21:07,978 - util.py[DEBUG]: failed stage init-local 2019-04-01 13:21:09,250 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [0/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67d6c50>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:09,252 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67eb588>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:10,255 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67eb2b0>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:11,259 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67ebe10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:12,264 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f35f8>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:13,268 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [4/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f3e10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:14,272 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [5/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67fa668>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:16,278 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [7/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f3550>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:18,283 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [9/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67ebe10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))] 2019-04-01 13:21:36,442 - cc_growpart.py[DEBUG]: '/' SKIPPED: device_part_info(/dev/mapper/atomicos-root) failed: /dev/mapper/atomicos-root not a partition 2019-04-01 13:21:36,609 - main.py[DEBUG]: Ran 14 modules with 0 failures 2019-04-01 13:21:37,847 - main.py[DEBUG]: Ran 7 modules with 0 failures 2019-04-01 13:24:19,548 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_scripts_user.py'>) failed 2019-04-01 13:24:19,548 - util.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_scripts_user.py'>) failed return self._runners.run(name, functor, args, freq, clear_on_fail) % (len(failed), len(attempted))) RuntimeError: Runparts: 2 failures in 11 attempted commands 2019-04-01 13:24:19,614 - main.py[DEBUG]: Ran 16 modules with 1 failures I cannot see any error in the metadata service logs and I can reach the server from my VM. I used the Openstack (pike) guide to deploy it manually without using any other system. In my setup, I have nova, neutron(Self-Service), Glance, Horizon, Keystone, Heat and Magnum running. Kind Regards, Navdeep -----Original Message----- From: Bharat Kunwar <bharat@stackhpc.com> Sent: 01 April 2019 14:59 To: Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> Cc: Mohammed Naser <mnaser@vexxhost.com>; openstack@lists.openstack.org Subject: Re: [Magnum] Cluster Create failure Hi Navdeep, Have you tried logging into the master/worker node and gripping for `fail` inside /var/log/cloud-init.log and /var/log/cloud-init-output.log? Also how did you deploy your OpenStack services? Bharat
On 1 Apr 2019, at 14:54, Navdeep Uniyal <navdeep.uniyal@bristol.ac.uk> wrote:
Dear All,
My Kubernetes Cluster is timing out after 60 mins.
Following is the update I am getting in magnum.log:
{"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-clus...", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}}
I am not sure how to triage this issue as I cannot see any errors in heat.log as well. Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions
I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries.
Kind Regards, Navdeep
On Thu, 28 Mar 2019, Navdeep Uniyal wrote:
Hi Bharat,
Thank you very much. It worked for me. However, I am getting resource error while starting the cluster. I have enough resources available in the hypervisor, but it errors out:
faults | {'0': 'ResourceInError: resources[0].resources.kube-master: Went to status ERROR due to "Message: No valid host was found. , Code: 500"', 'kube_masters': 'ResourceInError: resources.kube_masters.resources[0].resources.kube-master: Went to status ERROR due to "Message: No valid host was found. , Code: 500"', 'kube-master': 'ResourceInError: resources.kube-master: Went to status ERROR due to "Message: No valid host was found. , Code: 500"'} |
Nova Scheduler Error:
2019-03-28 11:36:37.686 79522 ERROR nova.scheduler.client.report [req-82c4fb8b-785b-40bf-82fc-ff9d0e6101a0 b4727cb329c14c388d777d0ce38c8a6b 4c6bc4445c764249921a0a6e40b192dd - default default] Failed to retrieve allocation candidates from placement API for filters {'VCPU': 4, 'MEMORY_MB': 4096, 'DISK_GB': 80}. Got 500: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
You'll want to look in the logs for your placement service (perhaps nova-placement-api.log) or the apache2 general error.log to find out what's causing this 500. Until that's resolved you won't be able to land any VMs. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
participants (4)
-
Bharat Kunwar
-
Chris Dent
-
Mohammed Naser
-
Navdeep Uniyal