Hello, For those interested who haven't followed the discussion on Gerrit and Launchpad, another patch addressing this issue was merged to master and has now been backported to all stable branches: https://review.opendev.org/q/I60d6f04d374e9ede5895a43b7a75e955b0fea3c5 Best regards, Pierre Riteau (priteau) On Tue, 24 Dec 2024 at 10:07, Masahito Muroi <masahito.muroi@lycorp.co.jp> wrote:
Hi,
We hit the same issue in our deployment, Dalmatian release. In our case, the nova-compute's libvirt access doesn't trigger thread context switch over 120 seconds and the long-running task triggers some heartbeat task failures.
I pushed one fix[1] to the gerrit.
1. https://review.opendev.org/c/openstack/nova/+/938215
best regards, Masahito
-----Original Message----- *From:* "Jakub Darmach"<jakub.darmach@gmail.com> *To:* <openstack-discuss@lists.openstack.org>; *Cc:* *Sent:* 2024/12/23(月) 22:34 (GMT+09:00) *Subject:* [Nova] Nova-compute service flapping on Antelope
Hello,
Recently I encountered an interesting issue - nova-compute service started to temporarily lose Rabbit connectivity to regain it after a few seconds on Antelope. The issue was replicated on the test environment, without specific steps to replicate though - it just starts to happen after some time.
I pasted a log example in the bug I opened [1].
First we can see rabbit logging closing AMQP connection, followed by nova-compute reporting rabbit server being unreachable, with next message being successful reconnection. Restarting nova-compute services helps temporarily, the issue starts to manifest after a few days. Initially disconnects once every few hours, each time in shorter intervals, to the point it happens every minute or so.
Did anyone encounter something similar?
[1] https://bugs.launchpad.net/nova/+bug/2092297 <https://urldefense.com/v3/__https://bugs.launchpad.net/nova/*bug/2092297__;Kw!!AEH8rfA!wLWemH1yjCpgDr2uxDQrBS933a9jsJwfjsU4S6cVOAJHXvgTzZw2QlytREjB7AmtB1nE3A6mvCaj2SEKSyqqc-FK8-E7qTTT$>
Pozdrawiam / Best regards, *Jakub Darmach*