[Magnum] Cluster Create failure

Bharat Kunwar bharat at stackhpc.com
Mon Apr 1 13:58:48 UTC 2019


Hi Navdeep,

Have you tried logging into the master/worker node and gripping for `fail` inside /var/log/cloud-init.log and /var/log/cloud-init-output.log? Also how did you deploy your OpenStack services? 

Bharat

> On 1 Apr 2019, at 14:54, Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk> wrote:
> 
> Dear All,
> 
> My Kubernetes Cluster is timing out after 60 mins.
> 
> Following is the update I am getting in magnum.log:
> 
> {"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-wwmvqecjiznb/06c05715-ac05-4287-905c-38f1964f09fe", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}}
> 
> I am not sure how to triage this issue as I cannot see any errors in heat.log as well. 
> Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions
> 
> I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries.
> 
> Kind Regards,
> Navdeep
> 




More information about the openstack-discuss mailing list