On Friday, January 10, 2025 at 02:30:02 AM EST, Eugen Block <eblock@nde.ag> wrote:

Hi,

the short answer is, I don't know. It really depends on what exactly
the issue is here. It requires some investigation to be able to tell
if any openstack tools can help, e. g. neutron related commands, or if
you need to tackle this from ovs directly, or if it's a database
inconsistency or maybe something else entirely.
For example, what I referred to was an interface that wasn't attached
to an instance anymore, and the port didn't exist anymore, but the
database had not been properly updated. I didn't see any other way
than to manually update the DB. But you might be facing something
different here, maybe someone else has experienced the same as you,
but since nobody else responded yet, you'll have to dig on your own.

Zitat von Albert Braden <ozzzo@yahoo.com>:

> Hi Eugen,
>
> Thanks for your email. I'm not aware of a VM that corresponds to the
> stuck interface. Every VM deletion on the affected hypervisor is
> delayed; this has been happening for a few weeks now. Looking at the
> logs, it appears that an interface is created when a VM is created,
> and deleted with that VM. I'm guessing that the interface may have
> become stuck when a customer deleted a VM during a maintenance, and
> the interface failed to delete. Is it possible to delete an
> interface without editing the database?
> On Thursday, January 9, 2025 at 07:54:46 AM EST, Eugen Block
> <eblock@nde.ag> wrote:
>
> Hi,
>
> maybe some more history and details could help understand what might
> be the issue here.
> So you have VMs with one or more interfaces that you try to delete. Do
> you see those interfaces in 'nova interface-list {UUID}'? And while
> deleting the VMs, ovs takes a long time becaus it can't find those
> devices, correct?
>
> This reminds me of our reinstallation a few months ago as well
> (importing the previous DB dump). After migrating from openSUSE to
> Ubuntu (Victoria), we upgraded to Wallaby. In the post-upgrade steps
> the 'nova-manage db online_data_migrations' failed because of a few
> instances with a weird port state. A user attached a new interface
> according to the event list, but we don't see a detach in the logs
> although the instances had only one remaining NIC (as before). We had
> to dig really deep, including manipulating the DB (mark that interface
> as deleted).
> We didn't try to delete those instances in that weird state, so I'm
> not sure if we could have ended up in the same situation as you. But
> maybe you have a similar thing going on with those undeleted
> interfaces in the DB that OVS tries to delete? Hard to tell...
>
> Regards,
> Eugen
>
> Zitat von Albert Braden <ozzzo@yahoo.com>:
>
>> I didn't see a response to this email. Trying again with a better subject:
>> On Friday, January 3, 2025 at 10:20:56 AM EST, Albert Braden
>> <ozzzo@yahoo.com> wrote:
>>
>>
>> One of our busiest clusters was rebuilt on Wallaby a few months ago (kolla-ansible) and deletions are taking a long time for VMs on some hypervisors. When I look at the ovs-vswitchd.log on the slow hypervisors I see lots of errors that apparently refer to a missing VM interface:
>>
>> 2024-12-27T19:06:10.482Z|01793|bridge|WARN|could not open network device qvo0430a2f7-cd (No such device)
>>
>> For most network devices I see "added" and "deleted" lines in the log. This one has an "added" line but no "deleted". I tried restarting the neutron_openvswitch_agent and openvswitch_vswitchd containers but that didn't make a difference. How can I get OVS to stop choking on this missing interface?