Open Stack

Mon Apr 1 16:14:35 UTC 2019

Hi Bharat,

Adding to my previous email, I did some changes and cloud-init seems to be running fine (it was an issue with the image I was using).
However, now I am using the fedora-atomic image with 80GB disk, 4vCPUs and 4GB RAM. I am getting 'no space left errors'. 

[  213.443636] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.444282] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.postinst: Cannot open: No such file or directory
[  213.444696] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.445345] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.postrm: Cannot open: No such file or directory
[  213.445811] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.499407] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.shlibs: Cannot open: No such file or directory
[  213.499788] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.500409] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.symbols: Cannot open: No such file or directory
[  213.500822] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.501247] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.templates: Cannot open: No such file or directory
[  213.501644] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.502302] cloud-init[1307]: tar: var/lib/dpkg/info/libssl1.1\:amd64.triggers: Cannot open: No such file or directory
[  213.502784] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.503401] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.list: Cannot open: No such file or directory
[  213.503833] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.504258] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.md5sums: Cannot open: No such file or directory
[  213.504660] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.505286] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.shlibs: Cannot open: No such file or directory
[  213.505692] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device
[  213.506358] cloud-init[1307]: tar: var/lib/dpkg/info/libtalloc2\:amd64.symbols: Cannot open: No such file or directory
[  213.506831] cloud-init[1307]: tar: var: Cannot mkdir: No space left on device

-----Original Message-----
From: Navdeep Uniyal 
Sent: 01 April 2019 15:19
To: 'Bharat Kunwar' <bharat at stackhpc.com>
Cc: Mohammed Naser <mnaser at vexxhost.com>; openstack at lists.openstack.org
Subject: RE: [Magnum] Cluster Create failure

Hi Bharat,

Thank you for your response.

I am getting following errors in my worker VM (Master VM has similar errors):

[feduser at kubernetes-cluster-wwmvqecjiznb-minion-0 ~]$ less /var/log/cloud-init.log | grep fail
2019-03-29 16:20:37,018 - cc_growpart.py[DEBUG]: '/' SKIPPED: device_part_info(/dev/mapper/atomicos-root) failed: /dev/mapper/atomicos-root not a partition
2019-03-29 16:20:37,219 - main.py[DEBUG]: Ran 14 modules with 0 failures
2019-03-29 16:20:38,450 - main.py[DEBUG]: Ran 7 modules with 0 failures
2019-03-29 16:20:39,501 - main.py[DEBUG]: Ran 16 modules with 0 failures
2019-04-01 13:21:07,978 - util.py[WARNING]: failed stage init-local
2019-04-01 13:21:07,978 - util.py[DEBUG]: failed stage init-local
2019-04-01 13:21:09,250 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [0/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /openstack (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67d6c50>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:09,252 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67eb588>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:10,255 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [1/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67eb2b0>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:11,259 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [2/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67ebe10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:12,264 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f35f8>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:13,268 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [4/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f3e10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:14,272 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [5/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67fa668>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:16,278 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [7/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67f3550>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:18,283 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [9/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f6ea67ebe10>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
2019-04-01 13:21:36,442 - cc_growpart.py[DEBUG]: '/' SKIPPED: device_part_info(/dev/mapper/atomicos-root) failed: /dev/mapper/atomicos-root not a partition
2019-04-01 13:21:36,609 - main.py[DEBUG]: Ran 14 modules with 0 failures
2019-04-01 13:21:37,847 - main.py[DEBUG]: Ran 7 modules with 0 failures
2019-04-01 13:24:19,548 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
2019-04-01 13:24:19,548 - util.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.5/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
    return self._runners.run(name, functor, args, freq, clear_on_fail)
    % (len(failed), len(attempted)))
RuntimeError: Runparts: 2 failures in 11 attempted commands
2019-04-01 13:24:19,614 - main.py[DEBUG]: Ran 16 modules with 1 failures

I cannot see any error in the metadata service logs and I can reach the server from my VM.

I used the Openstack (pike) guide to deploy it manually without using any other system. In my setup, I have nova, neutron(Self-Service), Glance, Horizon, Keystone, Heat and Magnum running.

Kind Regards,
Navdeep

-----Original Message-----
From: Bharat Kunwar <bharat at stackhpc.com> 
Sent: 01 April 2019 14:59
To: Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk>
Cc: Mohammed Naser <mnaser at vexxhost.com>; openstack at lists.openstack.org
Subject: Re: [Magnum] Cluster Create failure

Hi Navdeep,

Have you tried logging into the master/worker node and gripping for `fail` inside /var/log/cloud-init.log and /var/log/cloud-init-output.log? Also how did you deploy your OpenStack services? 

Bharat

> On 1 Apr 2019, at 14:54, Navdeep Uniyal <navdeep.uniyal at bristol.ac.uk> wrote:
> 
> Dear All,
> 
> My Kubernetes Cluster is timing out after 60 mins.
> 
> Following is the update I am getting in magnum.log:
> 
> {"stack": {"parent": null, "disable_rollback": true, "description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n", "parameters": {"magnum_url": "http://10.68.48.4:9511/v1", "kube_tag": "v1.11.6", "http_proxy": "", "cgroup_driver": "cgroupfs", "registry_container": "container", "kubernetes_port": "6443", "calico_kube_controllers_tag": "v1.0.3", "octavia_enabled": "False", "etcd_volume_size": "0", "kube_dashboard_enabled": "True", "master_flavor": "medium", "etcd_tag": "v3.2.7", "kube_version": "v1.11.6", "k8s_keystone_auth_tag": "1.13.0", "kube_service_account_private_key": "******", "keystone_auth_enabled": "True", "cloud_provider_tag": "v0.2.0", "ca_key": "******", "tiller_enabled": "False", "registry_enabled": "False", "verify_ca": "True", "password": "******", "dns_service_ip": "10.254.0.10", "ssh_key_name": "magnum_key", "flannel_tag": "v0.10.0-amd64", "flannel_network_subnetlen": "24", "dns_nameserver": "8.8.8.8", "number_of_masters": "1", "wait_condition_timeout": "6000", "portal_network_cidr": "10.254.0.0/16", "admission_control_list": "NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota", "pods_network_cidr": "10.100.0.0/16", "ingress_controller": "", "external_network": "751ae6e5-71af-4f78-b846-b0e1843093c8", "docker_volume_type": "", "registry_port": "5000", "tls_disabled": "False", "trust_id": "******", "swift_region": "", "influx_grafana_dashboard_enabled": "False", "volume_driver": "", "kubescheduler_options": "", "calico_tag": "v2.6.7", "loadbalancing_protocol": "TCP", "cloud_provider_enabled": "True", "OS::stack_id": "06c05715-ac05-4287-905c-38f1964f09fe", "flannel_cni_tag": "v0.3.0", "prometheus_monitoring": "False", "kubelet_options": "", "fixed_network": "", "kube_dashboard_version": "v1.8.3", "trustee_username": "d7ff417e-85b6-4b9a-94c3-211e7b830a51_4c6bc4445c764249921a0a6e40b192dd", "availability_zone": "", "server_image": "fedora-feduser-atomic", "flannel_network_cidr": "10.100.0.0/16", "cert_manager_api": "False", "minion_flavor": "medium", "kubeproxy_options": "", "calico_cni_tag": "v1.11.2", "cluster_uuid": "d7ff417e-85b6-4b9a-94c3-211e7b830a51", "grafana_admin_passwd": "******", "flannel_backend": "udp", "trustee_domain_id": "ac26210ad4f74217b3abf28a9b5cf56d", "fixed_subnet": "", "https_proxy": "", "username": "admin", "insecure_registry_url": "", "docker_volume_size": "0", "grafana_tag": "5.1.5", "kube_allow_priv": "true", "node_problem_detector_tag": "v0.6.2", "docker_storage_driver": "overlay2", "project_id": "4c6bc4445c764249921a0a6e40b192dd", "registry_chunksize": "5242880", "trustee_user_id": "d1983ea926c34536aabc8d50a85503e8", "container_infra_prefix": "", "number_of_minions": "1", "tiller_tag": "v2.12.3", "auth_url": "http://pluto:5000/v3", "registry_insecure": "True", "tiller_namespace": "magnum-tiller", "prometheus_tag": "v1.8.2", "OS::project_id": "4c6bc4445c764249921a0a6e40b192dd", "kubecontroller_options": "", "fixed_network_cidr": "10.0.0.0/24", "kube_service_account_key": "******", "ingress_controller_role": "ingress", "region_name": "RegionOne", "kubeapi_options": "", "openstack_ca": "******", "trustee_password": "******", "nodes_affinity_policy": "soft-anti-affinity", "minions_to_remove": "", "octavia_ingress_controller_tag": "1.13.2-alpha", "OS::stack_name": "kubernetes-cluster-wwmvqecjiznb", "system_pods_timeout": "5", "system_pods_initial_delay": "30", "dns_cluster_domain": "cluster.local", "calico_ipv4pool": "192.168.0.0/16", "network_driver": "flannel", "monitoring_enabled": "False", "heat_container_agent_tag": "stein-dev", "no_proxy": "", "discovery_url": "https://discovery.etcd.io/b8fe011e8b281615904de97ee05511a7"}, "deletion_time": null, "stack_name": "kubernetes-cluster-wwmvqecjiznb", "stack_user_project_id": "8204d11826fb4253ae7c9063306cb4e1", "tags": null, "creation_time": "2019-04-01T13:19:53Z", "links": [{"href": "http://pluto:8004/v1/4c6bc4445c764249921a0a6e40b192dd/stacks/kubernetes-cluster-wwmvqecjiznb/06c05715-ac05-4287-905c-38f1964f09fe", "rel": "self"}], "capabilities": [], "notification_topics": [], "timeout_mins": 60, "stack_status": "CREATE_IN_PROGRESS", "stack_owner": null, "updated_time": null, "id": "06c05715-ac05-4287-905c-38f1964f09fe", "stack_status_reason": "Stack CREATE started", "template_description": "This template will boot a Kubernetes cluster with one or more minions (as specified by the number_of_minions parameter, which defaults to 1).\n"}}
> 
> I am not sure how to triage this issue as I cannot see any errors in heat.log as well. 
> Even I can see both Master and Minion node running but the task errors out during OS::Heat::SoftwareDeployment in kube_cluster_deploy and OS::Heat::ResourceGroup in kube_minions
> 
> I don't have much experience with Kubernetes clusters as well so please forgive me if I am raising any silly queries.
> 
> Kind Regards,
> Navdeep
> 

Open Stack

[Magnum] Cluster Create failure

OpenStack

Community

Documentation

Branding & Legal