Hi Eugen,

Thanks for your email. I'm not aware of a VM that corresponds to the stuck interface. Every VM deletion on the affected hypervisor is delayed; this has been happening for a few weeks now. Looking at the logs, it appears that an interface is created when a VM is created, and deleted with that VM. I'm guessing that the interface may have become stuck when a customer deleted a VM during a maintenance, and the interface failed to delete. Is it possible to delete an interface without editing the database?
On Thursday, January 9, 2025 at 07:54:46 AM EST, Eugen Block <eblock@nde.ag> wrote:


Hi,

maybe some more history and details could help understand what might 
be the issue here.
So you have VMs with one or more interfaces that you try to delete. Do 
you see those interfaces in 'nova interface-list {UUID}'? And while 
deleting the VMs, ovs takes a long time becaus it can't find those 
devices, correct?

This reminds me of our reinstallation a few months ago as well 
(importing the previous DB dump). After migrating from openSUSE to 
Ubuntu (Victoria), we upgraded to Wallaby. In the post-upgrade steps 
the 'nova-manage db online_data_migrations' failed because of a few 
instances with a weird port state. A user attached a new interface 
according to the event list, but we don't see a detach in the logs 
although the instances had only one remaining NIC (as before). We had 
to dig really deep, including manipulating the DB (mark that interface 
as deleted).
We didn't try to delete those instances in that weird state, so I'm 
not sure if we could have ended up in the same situation as you. But 
maybe you have a similar thing going on with those undeleted 
interfaces in the DB that OVS tries to delete? Hard to tell...

Regards,
Eugen

Zitat von Albert Braden <ozzzo@yahoo.com>:

> I didn't see a response to this email. Trying again with a better subject:
>      On Friday, January 3, 2025 at 10:20:56 AM EST, Albert Braden 
> <ozzzo@yahoo.com> wrote:
>

> One of our busiest clusters was rebuilt on Wallaby a few months ago (kolla-ansible) and deletions are taking a long time for VMs on some hypervisors. When I look at the ovs-vswitchd.log on the slow hypervisors I see lots of errors that apparently refer to a missing VM interface:
>
> 2024-12-27T19:06:10.482Z|01793|bridge|WARN|could not open network device qvo0430a2f7-cd (No such device)
>
> For most network devices I see "added" and "deleted" lines in the log. This one has an "added" line but no "deleted". I tried restarting the neutron_openvswitch_agent and openvswitch_vswitchd containers but that didn't make a difference. How can I get OVS to stop choking on this missing interface?