Em qua., 19 de mar. de 2025 às 18:39, Eugen Block <eblock@nde.ag> escreveu:

I would probably start with restarting the neutron agents. I’ve come
across comparable situations a couple of times, at least it seems like
it. Usually I stop all agents, neutron-server last. Then start
neutron-server first, then the other agents.
But if you had a wider outage, restarting rabbitmq might be necessary
as well. Check rabbitmqctl cluster_status and its logs to determine if
rabbitmq is okay.

Zitat von Winicius Allan <winiciusab12@gmail.com>:

> Digging into logs, I found that entries
>
> 2025-03-19 20:43:37.834 26 WARNING neutron.plugins.ml2.drivers.mech_agent
> [req-f4e98255-be23-4d7a-9d4a-c680b1320bdf
> req-c4a562cf-817f-4d7d-be1c-907e47c4c940 aa8b08700bdd4acc99a8e4a33180f764
> e8866ffd910d4a08bdb347aedd80cdf1 - - default default] Refusing to bind port
> 6961fd23-9edd-4dc7-99ad-88213965c796 to dead agent: {'id':
> 'ab9612a8-0a92-4545-a088-9ea0dd1e527b', 'agent_type': 'Open vSwitch agent',
> 'binary': 'neutron-openvswitch-agent', 'topic': 'N/A', 'host': xx,
> 'admin_state_up': True, 'created_at': datetime.datetime(2024, 6, 12, 17, 4,
> 25), 'started_at': datetime.datetime(2025, 3, 15, 21, 1, 4),
> 'heartbeat_timestamp': datetime.datetime(2025, 3, 19, 20, 41, 53),
> 'description': None, 'resources_synced': None, 'availability_zone': None,
> 'alive': False,
>
> ERROR neutron.plugins.ml2.managers
> [req-f4e98255-be23-4d7a-9d4a-c680b1320bdf
> req-c4a562cf-817f-4d7d-be1c-907e47c4c940 aa8b08700bdd4acc99a8e4a33180f764
> e8866ffd910d4a08bdb347aedd80cdf1 - - default default] Failed to bind port
> 6961fd23-9edd-4dc7-99ad-88213965c796 on host lsd-srv-115 for vnic_type
> normal using segments
>
> and I came up here[1]. I think it is a problem with RabbitMQ, which is
> reporting for neutron that the OVS agent is not alive although the
> container has a "healthy" status
>
> When I run "openstack network agent list" the output is inconsistent. One
> time, it shows that some agents are not alive, and another time it shows
> other agents as alive/not alive. Is running a rolling restart in RabbitMQ
> the way? Has anyone already faced this problem?
>
> [1]
> https://github.com/openstack/neutron/blob/unmaintained/zed/neutron/plugins/ml2/managers.py#L819
>
> Regards.
>
> Em qua., 19 de mar. de 2025 às 10:08, Winicius Allan <winiciusab12@gmail.com>
> escreveu:
>
>> Hello stackers!
>>
>> release: zed
>> deploy-tool: kolla-ansible
>>
>> After an outage, all the load balancers from my cluster came to
>> provisioning status ERROR because the o-hm0 interface was unavailable on
>> controller nodes. I've created them again and tried a failover on load
>> balancers. The octavia-worker logs show that failover was completed
>> successfully, but the provisioning status remains with ERROR.
>> Looking into nova-compute logs, I see that an exception was raised
>>
>> os_vif.exception.NetworkInterfaceNotFound: Network interface
>> qvob8ac0f5f-46 not found
>>
>> The instance id for that interface matches the new amphora id. I'll attach
>> the nova-compute logs here[1]. In neutron-server logs, there is no ERROR or
>> WARNING that I can found, the only suspicious entry is
>>
>> Port b8ac0f5f-4613-4b95-9690-7cf59f739fd0 cannot update to ACTIVE because
>> it is not bound. _port_provisioned
>> /var/lib/kolla/venv/lib/python3.10/site-packages/neutron/plugins/ml2/plugin.py:360
>>
>> Can anyone give me a light?
>>
>> [1] https://pastebin.com/hV3xgu0e
>>