[Magnum] Cluster Create failure

Navdeep Uniyal navdeep.uniyal at bristol.ac.uk
Mon Apr 1 13:54:02 UTC 2019


Dear All,

My Kubernetes Cluster is timing out after 60 mins.

Following is the update I am getting in magnum.log:

{"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-wwmvqecjiznb/06c05715-ac05-4287-905c-38f1964f09fe", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}}

I am not sure how to triage this issue as I cannot see any errors in heat.log as well. 
Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions

I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries.

Kind Regards,
Navdeep


-----Original Message-----
From: Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk> 
Sent: 29 March 2019 12:16
To: Mohammed Naser <mnaser at vexxhost.com>; Bharat Kunwar <bharat at stackhpc.com>
Cc: openstack at lists.openstack.org
Subject: RE: [Magnum] Cluster Create failure

Hi Guys,

I am able to resolve the issue in nova. (it was a problem with the oslo.db version - Somehow I installed version 4.44 instead of 4.25 for my pike installation) However, moving forward, I started my kube cluster, I could see 2 instances running for Kube-master and kube-minion. But the deployment failed after that with the following error:

{"message": "The resource was found at <a href=\"http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_d
epth=2\">http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_depth=2</a>;\nyou should be redirected au tomatically.\n\n", "code": "302 Found", "title": "Found"}  log_http_response /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:157
2019-03-29 12:05:51.225 157681 DEBUG heatclient.common.http [req-76e55dec-9511-4aad-aa52-af9978b40eed - - - - -] curl -g -i -X GET -H 'User-Agent: python-heatclient' -H 'Content-Type: application/json' -H 'X-Aut
h-Url: http://pluto:5000/v3' -H 'Accept: application/json' -H 'X-Auth-Token: {SHA1}f2c32656c7103ad0b89d83ff9f1b6cebc0a6eee7' http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgka
nhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources?status=FAILED&nested_depth=2 log_curl_request /var/lib/magnum/env/local/lib/python2.7/site-packages/heatclient/common/http.py:144
2019-03-29 12:05:51.379 157681 DEBUG heatclient.common.http [req-76e55dec-9511-4aad-aa52-af9978b40eed - - - - -]
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 4035
X-Openstack-Request-Id: req-942ef8fa-1bba-4573-9022-0d4e135772e0
Date: Fri, 29 Mar 2019 12:05:51 GMT
Connection: keep-alive

{"resources": [{"resource_name": "kube_cluster_deploy", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources/kube_cluster_deploy", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb", "rel": "stack"}], "logical_resource_id": "kube_cluster_deploy", "creation_time": "2019-03-29T10:40:00Z", "resource_status": "CREATE_FAILED", "updated_time": "2019-03-29T10:40:00Z", "required_by": [], "resource_status_reason": "CREATE aborted (Task create from SoftwareDeployment \"kube_cluster_deploy\" Stack \"kubernetes-cluster-eovgkanhoa4x\" [384d8725-bca3-4fa4-a9fd-f18687aab8fb] Timed out)", "physical_resource_id": "8d715a3f-6ec8-4772-ba4b-1056cd4ab7d3", "resource_type": "OS::Heat::SoftwareDeployment"}, {"resource_name": "kube_minions", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb/resources/kube_minions", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x/384d8725-bca3-4fa4-a9fd-f18687aab8fb", "rel": "stack"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46/33700819-0766-4d30-954b-29aace6048cc", "rel": "nested"}], "logical_resource_id": "kube_minions", "creation_time": "2019-03-29T10:40:00Z", "resource_status_reason": "CREATE aborted (Task create from ResourceGroup \"kube_minions\" Stack \"kubernetes-cluster-eovgkanhoa4x\" [384d8725-bca3-4fa4-a9fd-f18687aab8fb] Timed out)", "updated_time": "2019-03-29T10:40:00Z", "required_by": [], "resource_status": "CREATE_FAILED", "physical_resource_id": "33700819-0766-4d30-954b-29aace6048cc", "resource_type": "OS::Heat::ResourceGroup"}, {"parent_resource": "kube_minions", "resource_name": "0", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46/33700819-0766-4d30-954b-29aace6048cc/resources/0", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46/33700819-0766-4d30-954b-29aace6048cc", "rel": "stack"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46-0-ftjzf76onzqn/d1a8214c-c5b0-488c-83d6-f0a9cacbe844", "rel": "nested"}], "logical_resource_id": "0", "creation_time": "2019-03-29T10:40:59Z", "resource_status_reason": "resources[0]: Stack CREATE cancelled", "updated_time": "2019-03-29T10:40:59Z", "required_by": [], "resource_status": "CREATE_FAILED", "physical_resource_id": "d1a8214c-c5b0-488c-83d6-f0a9cacbe844", "resource_type": "file:///var/lib/magnum/env/lib/python2.7/site-packages/magnum/drivers/k8s_fedora_atomic_v1/templates/kubeminion.yaml"}, {"parent_resource": "0", "resource_name": "minion_wait_condition", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46-0-ftjzf76onzqn/d1a8214c-c5b0-488c-83d6-f0a9cacbe844/resources/minion_wait_condition", "rel": "self"}, {"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46-0-ftjzf76onzqn/d1a8214c-c5b0-488c-83d6-f0a9cacbe844", "rel": "stack"}], "logical_resource_id": "minion_wait_condition", "creation_time": "2019-03-29T10:41:01Z", "resource_status": "CREATE_FAILED", "updated_time": "2019-03-29T10:41:01Z", "required_by": [], "resource_status_reason": "CREATE aborted (Task create from HeatWaitCondition \"minion_wait_condition\" Stack \"kubernetes-cluster-eovgkanhoa4x-kube_minions-otcpiw3oye46-0-ftjzf76onzqn\" [d1a8214c-c5b0-488c-83d6-f0a9cacbe844] Timed out)", "physical_resource_id": "", "resource_type": "OS::Heat::WaitCondition"}]}

I am not sure how to debug this. Please advise.

Kind Regards,
Navdeep

-----Original Message-----
From: Mohammed Naser <mnaser at vexxhost.com>
Sent: 28 March 2019 13:27
To: Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk>
Cc: Bharat Kunwar <bharat at stackhpc.com>; openstack at lists.openstack.org
Subject: Re: [Magnum] Cluster Create failure

your placement service seems to be broken :)

On Thu, Mar 28, 2019 at 9:10 AM Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk> wrote:
>
> Yes, there seems to be some issue with the server creation now.
> I will check and try resolving that. Thank you
>
> Regards,
> Navdeep
>
> -----Original Message-----
> From: Bharat Kunwar <bharat at stackhpc.com>
> Sent: 28 March 2019 12:40
> To: Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk>
> Cc: openstack at lists.openstack.org
> Subject: Re: [Magnum] Cluster Create failure
>
> Can you create a server normally?
>


--
Mohammed Naser — vexxhost
-----------------------------------------------------
D. 514-316-8872
D. 800-910-1726 ext. 200
E. mnaser at vexxhost.com
W. http://vexxhost.com


More information about the openstack-discuss mailing list