Hi, We’re facing an issue when deploying OpenStack 2024.1 with OpenStack Helm and OVS, the health probes fail for neutron-l3-agent, neutron-dhcp-agent and nova-compute while the internal RPC communication between the nova and neutron components look fine: Neutron: openstack network agent list +--------------------------------------+--------------------+--------------------------------+-------------------+-------+-------+---------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+--------------------+--------------------------------+-------------------+-------+-------+---------------------------+ | 433c2b07-5c82-4545-991d-6d0de6044ae9 | DHCP agent | fig-virt-intdev-alberto-node-0 | nova | :-) | UP | neutron-dhcp-agent | | 56c050fa-e58e-46a6-8191-0d0d481de246 | Open vSwitch agent | fig-virt-intdev-alberto-node-0 | None | :-) | UP | neutron-openvswitch-agent | | 5ea28388-3fac-4ebb-bce4-6b210f8a3766 | Metadata agent | fig-virt-intdev-alberto-node-0 | None | :-) | UP | neutron-metadata-agent | | 6a4506ca-6fe4-489a-b370-6e426db40d50 | DHCP agent | fig-virt-intdev-alberto-node-1 | nova | :-) | UP | neutron-dhcp-agent | | 772a1d3a-b432-4ac3-8dfe-d3affc6f226d | Metadata agent | fig-virt-intdev-alberto-node-1 | None | :-) | UP | neutron-metadata-agent | | 7bd8360c-468a-4473-a470-abfb1ab435da | Open vSwitch agent | fig-virt-intdev-alberto-node-1 | None | :-) | UP | neutron-openvswitch-agent | | 9d362a63-5740-44ed-ac77-4548eee59205 | L3 agent | fig-virt-intdev-alberto-node-0 | nova | :-) | UP | neutron-l3-agent | | a2cc9dc4-1600-4c5d-a250-2e43cf062cf3 | L3 agent | fig-virt-intdev-alberto-node-1 | nova | :-) | UP | neutron-l3-agent | +--------------------------------------+--------------------+--------------------------------+-------------------+-------+-------+---------------------------+ The neutron-l3 and neutron-dhcp pods are not ready: k -n openstack get pods -l application=neutron |grep 'l3\|dhcp' neutron-dhcp-agent-default-fw5rm 0/1 Running 0 127m neutron-dhcp-agent-default-vh7c4 0/1 Running 0 127m neutron-l3-agent-default-4h5tp 0/1 Running 0 24m neutron-l3-agent-default-bdngv 0/1 Running 0 24m Because the health probes fail: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 4m38s (x119 over 127m) kubelet Readiness probe failed: Health probe timed out. Agent is down or response timed out When trying to execute the health probe manually from the container, the probe fails with a timeout after 60 seconds (RPC_PROBE_TIMEOUT=60) neutron@fig-virt-intdev-alberto-node-1:/$ python /tmp/health-probe.py --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini --agent-queue-name dhcp_agent --use-fqdn Health probe timed out. Agent is down or response timed out The situation for nova is quite similar: openstack compute service list +--------------------------------------+----------------+---------------------------------+----------+---------+-------+----------------------------+ | ID | Binary | Host | Zone | Status | State | Updated At | +--------------------------------------+----------------+---------------------------------+----------+---------+-------+----------------------------+ | eaf4684c-88a4-47dd-b340-b7dbebf148ba | nova-conductor | nova-conductor-679d57f977-zbq92 | internal | enabled | up | 2025-03-27T13:11:33.000000 | | 9164d63f-608c-44cc-8a92-2f2bac07c622 | nova-scheduler | nova-scheduler-845f87b9b5-vq9p5 | internal | enabled | up | 2025-03-27T13:11:36.000000 | | 464cd394-8b26-49ad-8c82-f014ca00ec7e | nova-compute | fig-virt-intdev-alberto-node-1 | nova | enabled | up | 2025-03-27T13:11:33.000000 | | aa7e5f6d-2533-4953-bb8d-9f9bfb7fc6bd | nova-compute | fig-virt-intdev-alberto-node-0 | nova | enabled | up | 2025-03-27T13:11:33.000000 | +--------------------------------------+----------------+---------------------------------+----------+---------+-------+----------------------------+ And the nova-compute pods are not ready because the health probes fail: k -n openstack get pods -l component=compute NAME READY STATUS RESTARTS AGE nova-compute-default-t8w7x 1/2 Running 1 (70m ago) 3h31m nova-compute-default-vz7nf 1/2 Running 1 (70m ago) 3h31m To be sure that the OpenStack RPC is working fine, the health probes have been deleted for nova-compute, then nova-cell-setup is executed and the hypervisors are ready. Some VMs have been launched in this environment successfully (nova-compute), getting IP from the DHCP agent or adding floating IPs (l3-agent) and the OpenStack behaviour is normal. So I’m confused about the health-probes failing. Any idea on what can be wrong or what to check is welcome. Thanks! Alberto Note: This is a Kubernetes cluster running on two VMs (OpenStack). This virtual environment has been redeployed several times with some modifications and the issue is fully reproducible: the probes for nova-compute, neutron-l3-agent and neutron-dhcp-agent always fail.