[openstack-ansible][magnum-cluster-api]
I tried magnum-cluster-api on openstack ansible with Antelope version, I have finished creating the k8s control plane containers and then Run the magnum-cluster-api deployment, but at the time of running it an error was found, where the role openstack.osa.lxc_container_setup is not available, anyone knows how to solve it? root@aio1:/opt/openstack-ansible# openstack-ansible osa_ops.mcapi_vexxhost.k8s_install Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml -e @/etc/openstack_deploy/user_variables_octavia.yml " [WARNING]: Unable to parse /etc/openstack_deploy/inventory.ini as an inventory source [WARNING]: running playbook inside collection osa_ops.mcapi_vexxhost ERROR! the role 'openstack.osa.lxc_container_setup' was not found in osa_ops.mcapi_vexxhost:ansible.legacy:/etc/ansible/ansible_collections/osa_ops/mcapi_vexxhost/playbooks/roles:/etc/ansible/roles:/etc/ansible/roles/ceph-ansible/roles:/etc/ansible/ansible_collections/osa_ops/mcapi_vexxhost/playbooks The error appears to be in '/etc/ansible/ansible_collections/osa_ops/mcapi_vexxhost/playbooks/k8s_install.yml': line 36, column 15, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - import_role: name: openstack.osa.lxc_container_setup ^ here EXIT NOTICE [Playbook execution failure] ************************************** ===============================================================================
Hey, This role is available in our collection on current master [1]. In previous versions (like antelope) this was a shared playbook [2]. Eventually, capi driver support was made and tested only with 2024.1, it can work with 2023.1 but with bunch of adjustments only. Hope this helps. [1] https://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/ro... [2] https://opendev.org/openstack/openstack-ansible/src/branch/stable/2023.1/pla... On Fri, Apr 19, 2024, 16:22 <pahrialtkj@gmail.com> wrote:
I tried magnum-cluster-api on openstack ansible with Antelope version, I have finished creating the k8s control plane containers and then Run the magnum-cluster-api deployment, but at the time of running it an error was found, where the role openstack.osa.lxc_container_setup is not available, anyone knows how to solve it?
root@aio1:/opt/openstack-ansible# openstack-ansible osa_ops.mcapi_vexxhost.k8s_install Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml -e @/etc/openstack_deploy/user_variables_octavia.yml " [WARNING]: Unable to parse /etc/openstack_deploy/inventory.ini as an inventory source [WARNING]: running playbook inside collection osa_ops.mcapi_vexxhost ERROR! the role 'openstack.osa.lxc_container_setup' was not found in osa_ops.mcapi_vexxhost:ansible.legacy:/etc/ansible/ansible_collections/osa_ops/mcapi_vexxhost/playbooks/roles:/etc/ansible/roles:/etc/ansible/roles/ceph-ansible/roles:/etc/ansible/ansible_collections/osa_ops/mcapi_vexxhost/playbooks
The error appears to be in '/etc/ansible/ansible_collections/osa_ops/mcapi_vexxhost/playbooks/k8s_install.yml': line 36, column 15, but may be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- import_role: name: openstack.osa.lxc_container_setup ^ here
EXIT NOTICE [Playbook execution failure] **************************************
===============================================================================
I already backport some patch, and already run magnum-cluster-api installation, but I stuck in this step : ``` TASK [vexxhost.containers.containerd : Create containerd config file] ************************************************************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleFilterError: Failed to import docker-image-py module, ensure it is installed on the controller fatal: [aio1_k8s_container-77e91d86]: FAILED! => {"changed": false, "msg": "AnsibleFilterError: Failed to import docker-image-py module, ensure it is installed on the controller"} RUNNING HANDLER [vexxhost.containers.containerd : Restart containerd] ************************************************************************************** fatal: [aio1_k8s_container-77e91d86]: FAILED! => {"changed": false, "msg": "Unable to start service containerd: Job for containerd.service failed because the control process exited with error code.\nSee \"systemctl status containerd.service\" and \"journalctl -xeu containerd.service\" for details.\n"} ``` I tried to install manually the module docker-image-py on k8s_container, but it's not solve, the error still same.
I'm not 100% sure, but I think it might be needed on Ansible deploy host. I guess I would try installing via /opt/ansible-runtime/bin/pip3 install docker-image-py In master we have implemented ability to define extra requirements to be installed on controller by creating file /etc/openstack_deploy/user-ansible-venv-requirements.txt [1] [1] https://opendev.org/openstack/openstack-ansible/commit/5b57f10eeccca16771de7... On Fri, Apr 19, 2024, 18:12 <pahrialtkj@gmail.com> wrote:
I already backport some patch, and already run magnum-cluster-api installation, but I stuck in this step : ``` TASK [vexxhost.containers.containerd : Create containerd config file] ************************************************************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.errors.AnsibleFilterError: Failed to import docker-image-py module, ensure it is installed on the controller fatal: [aio1_k8s_container-77e91d86]: FAILED! => {"changed": false, "msg": "AnsibleFilterError: Failed to import docker-image-py module, ensure it is installed on the controller"}
RUNNING HANDLER [vexxhost.containers.containerd : Restart containerd] ************************************************************************************** fatal: [aio1_k8s_container-77e91d86]: FAILED! => {"changed": false, "msg": "Unable to start service containerd: Job for containerd.service failed because the control process exited with error code.\nSee \"systemctl status containerd.service\" and \"journalctl -xeu containerd.service\" for details.\n"} ```
I tried to install manually the module docker-image-py on k8s_container, but it's not solve, the error still same.
Yes, It;s work, the installation continues, then I found new error : TASK [vexxhost.kubernetes.kubernetes : Initialize cluster] *************************************************************************************************************** fatal: [aio1_k8s_container-77e91d86]: FAILED! => {"changed": true, "cmd": "kubeadm init --config /etc/kubernetes/kubeadm.yaml --upload-certs --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests\n", "delta": "0:02:16.422277", "end": "2024-04-19 17:37:37.723866", "msg": "non-zero return code", "rc": 1, "start": "2024-04-19 17:35:21.301589", "stderr": "W0419 17:35:21.339245 1813 initconfiguration.go:336] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration\nW0419 17:35:30.657363 1813 checks.go:835] detected that the sandbox image \"registry.k8s.io/pause:3.8\" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using \"registry.k8s.io/pause:3.9\" as the CRI sandbox image.\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W0419 17:35:21.339245 1813 initconfiguration.go:336] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration", "W0419 17:35:30.657363 1813 checks.go:835] detected that the sandbox image \"registry.k8s.io/pause:3.8\" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using \"registry.k8s.io/pause:3.9\" as the CRI sandbox image.", "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.28.4\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [aio1-k8s-container-77e91d86 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.29.236.157 172.29.236.101]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Starting the kubelet\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'\n\nAdditionally, a control plane component may have crashed or exited when started by the container runtime.\nTo troubleshoot, list all containers using your preferred container runtimes CLI.\nHere is one example how you may list all running Kubernetes containers by using crictl:\n\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'\n\tOnce you have found the failing container, you can inspect its logs with:\n\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.28.4", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Generating \"ca\" certificate and key", "[certs] Generating \"apiserver\" certificate and key", "[certs] apiserver serving cert is signed for DNS names [aio1-k8s-container-77e91d86 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.29.236.157 172.29.236.101]", "[certs] Generating \"apiserver-kubelet-client\" certificate and key", "[certs] Generating \"front-proxy-ca\" certificate and key", "[certs] Generating \"front-proxy-client\" certificate and key", "[certs] Generating \"etcd/ca\" certificate and key", "[certs] Generating \"etcd/server\" certificate and key", "[certs] etcd/server serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]", "[certs] Generating \"etcd/peer\" certificate and key", "[certs] etcd/peer serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]", "[certs] Generating \"etcd/healthcheck-client\" certificate and key", "[certs] Generating \"apiserver-etcd-client\" certificate and key", "[certs] Generating \"sa\" key and public key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Starting the kubelet", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \"http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'", "", "Additionally, a control plane component may have crashed or exited when started by the container runtime.", "To troubleshoot, list all containers using your preferred container runtimes CLI.", "Here is one example how you may list all running Kubernetes containers by using crictl:", "\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'", "\tOnce you have found the failing container, you can inspect its logs with:", "\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'"]} Then I change manually cgroupDriver from systemd to cgroupfs, after that installation process continues and stuck again with this error : TASK [vexxhost.kubernetes.kubernetes : Allow workload on control plane node] ********************************************************************************************* An exception occurred during task execution. To see the full traceback, use -vvv. The error was: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.29.236.101', port=6443): Max retries exceeded with url: /version (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)'))) If I check from systemctl status kubelet, these the logs : kubelet.go:1511] "Failed to start ContainerManager" err="[open /proc/sys/kernel/panic: read-only file system, open /proc/sys/kernel/panic_on_oops: read-only file system, open /proc/sys/vm/overcommit_memory: read-only file system]" Is this because the lxc config is still not rw so it happens like that?
Frankly speaking, I'm really not aware about Vexxhost collection at all. But your point about LXC rw issue reminded me about this extra config we use in CI to make LXC bind mount some required paths in rw here: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/mcapi_... But really no idea if that's related to the issue or not. On Fri, Apr 19, 2024, 19:57 <pahrialtkj@gmail.com> wrote:
Yes, It;s work, the installation continues, then I found new error :
TASK [vexxhost.kubernetes.kubernetes : Initialize cluster] ***************************************************************************************************************
fatal: [aio1_k8s_container-77e91d86]: FAILED! => {"changed": true, "cmd": "kubeadm init --config /etc/kubernetes/kubeadm.yaml --upload-certs --ignore-preflight-errors=DirAvailable--etc-kubernetes-manifests\n", "delta": "0:02:16.422277", "end": "2024-04-19 17:37:37.723866", "msg": "non-zero return code", "rc": 1, "start": "2024-04-19 17:35:21.301589", "stderr": "W0419 17:35:21.339245 1813 initconfiguration.go:336] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration\nW0419 17:35:30.657363 1813 checks.go:835] detected that the sandbox image \" registry.k8s.io/pause:3.8\ <http://registry.k8s.io/pause:3.8%5C>" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using \"registry.k8s.io/pause:3.9\ <http://registry.k8s.io/pause:3.9%5C>" as the CRI sandbox image.\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W0419 17:35:21.339245 1813 initconfiguration.go:336] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration", "W0419 17:35:30.657363 1813 checks.go:835] detected that the sandbox image \" registry.k8s.io/pause:3.8\ <http://registry.k8s.io/pause:3.8%5C>" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using \"registry.k8s.io/pause:3.9\ <http://registry.k8s.io/pause:3.9%5C>" as the CRI sandbox image.", "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.28.4\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [aio1-k8s-container-77e91d86 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.29.236.157 172.29.236.101]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Starting the kubelet\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'\n\nAdditionally, a control plane component may have crashed or exited when started by the container runtime.\nTo troubleshoot, list all containers using your preferred container runtimes CLI.\nHere is one example how you may list all running Kubernetes containers by using crictl:\n\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'\n\tOnce you have found the failing container, you can inspect its logs with:\n\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.28.4", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Generating \"ca\" certificate and key", "[certs] Generating \"apiserver\" certificate and key", "[certs] apiserver serving cert is signed for DNS names [aio1-k8s-container-77e91d86 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.29.236.157 172.29.236.101]", "[certs] Generating \"apiserver-kubelet-client\" certificate and key", "[certs] Generating \"front-proxy-ca\" certificate and key", "[certs] Generating \"front-proxy-client\" certificate and key", "[certs] Generating \"etcd/ca\" certificate and key", "[certs] Generating \"etcd/server\" certificate and key", "[certs] etcd/server serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]", "[certs] Generating \"etcd/peer\" certificate and key", "[certs] etcd/peer serving cert is signed for DNS names [aio1-k8s-container-77e91d86 localhost] and IPs [172.29.236.157 127.0.0.1 ::1]", "[certs] Generating \"etcd/healthcheck-client\" certificate and key", "[certs] Generating \"apiserver-etcd-client\" certificate and key", "[certs] Generating \"sa\" key and public key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Starting the kubelet", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "[kubelet-check] It seems like the kubelet isn't running or healthy.", "[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get \" http://localhost:10248/healthz\": dial tcp [::1]:10248: connect: connection refused.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'", "", "Additionally, a control plane component may have crashed or exited when started by the container runtime.", "To troubleshoot, list all containers using your preferred container runtimes CLI.", "Here is one example how you may list all running Kubernetes containers by using crictl:", "\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'", "\tOnce you have found the failing container, you can inspect its logs with:", "\t- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'"]}
Then I change manually cgroupDriver from systemd to cgroupfs, after that installation process continues and stuck again with this error :
TASK [vexxhost.kubernetes.kubernetes : Allow workload on control plane node] ********************************************************************************************* An exception occurred during task execution. To see the full traceback, use -vvv. The error was: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='172.29.236.101', port=6443): Max retries exceeded with url: /version (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)')))
If I check from systemctl status kubelet, these the logs :
kubelet.go:1511] "Failed to start ContainerManager" err="[open /proc/sys/kernel/panic: read-only file system, open /proc/sys/kernel/panic_on_oops: read-only file system, open /proc/sys/vm/overcommit_memory: read-only file system]"
Is this because the lxc config is still not rw so it happens like that?
participants (2)
-
Dmitriy Rabotyagov
-
pahrialtkj@gmail.com