[Nova] Nova-compute service flapping on Antelope
Hello, Recently I encountered an interesting issue - nova-compute service started to temporarily lose Rabbit connectivity to regain it after a few seconds on Antelope. The issue was replicated on the test environment, without specific steps to replicate though - it just starts to happen after some time. I pasted a log example in the bug I opened [1]. First we can see rabbit logging closing AMQP connection, followed by nova-compute reporting rabbit server being unreachable, with next message being successful reconnection. Restarting nova-compute services helps temporarily, the issue starts to manifest after a few days. Initially disconnects once every few hours, each time in shorter intervals, to the point it happens every minute or so. Did anyone encounter something similar? [1] https://bugs.launchpad.net/nova/+bug/2092297 Pozdrawiam / Best regards, *Jakub Darmach*
Hi, We hit the same issue in our deployment, Dalmatian release. In our case, the nova-compute's libvirt access doesn't trigger thread context switch over 120 seconds and the long-running task triggers some heartbeat task failures. I pushed one fix[1] to the gerrit. 1. https://review.opendev.org/c/openstack/nova/+/938215 best regards, Masahito -----Original Message----- From: "Jakub Darmach"<jakub.darmach@gmail.com> To: <openstack-discuss@lists.openstack.org>; Cc: Sent: 2024/12/23(月) 22:34 (GMT+09:00) Subject: [Nova] Nova-compute service flapping on Antelope Hello, Recently I encountered an interesting issue - nova-compute service started to temporarily lose Rabbit connectivity to regain it after a few seconds on Antelope. The issue was replicated on the test environment, without specific steps to replicate though - it just starts to happen after some time. I pasted a log example in the bug I opened [1]. First we can see rabbit logging closing AMQP connection, followed by nova-compute reporting rabbit server being unreachable, with next message being successful reconnection. Restarting nova-compute services helps temporarily, the issue starts to manifest after a few days. Initially disconnects once every few hours, each time in shorter intervals, to the point it happens every minute or so. Did anyone encounter something similar? [1] https://bugs.launchpad.net/nova/+bug/2092297 Pozdrawiam / Best regards, Jakub Darmach
participants (2)
-
Jakub Darmach
-
Masahito Muroi