Hi,

We hit the same issue in our deployment, Dalmatian release.  In our case, the nova-compute's libvirt access doesn't trigger thread context switch over 120 seconds and the long-running task triggers some heartbeat task failures.

I pushed one fix[1] to the gerrit.

1. https://review.opendev.org/c/openstack/nova/+/938215

best regards,
Masahito

-----Original Message-----
From: "Jakub Darmach"<jakub.darmach@gmail.com>
To: <openstack-discuss@lists.openstack.org>;
Cc:
Sent: 2024/12/23(月) 22:34 (GMT+09:00)
Subject: [Nova] Nova-compute service flapping on Antelope

Hello,

Recently I encountered an interesting issue - nova-compute service started to temporarily lose Rabbit connectivity to regain it after a few seconds on Antelope. The issue was replicated on the test environment, without specific steps to replicate though - it just starts to happen after some time.

I pasted a log example in the bug I opened [1].

First we can see rabbit logging closing AMQP connection, followed by nova-compute reporting rabbit server being unreachable, with next message being successful reconnection. Restarting nova-compute services helps temporarily, the issue starts to manifest after a few days.
Initially disconnects once every few hours, each time in shorter intervals, to the point it happens every minute or so.

Did anyone encounter something similar?

[1] https://bugs.launchpad.net/nova/+bug/2092297

Pozdrawiam / Best regards,
Jakub Darmach