[neutron][nova] [kolla] vif plugged timeout
bdobreli at redhat.com
Wed Nov 24 10:05:24 UTC 2021
On 11/24/21 1:21 AM, Tony Liu wrote:
> I hit the same problem, from time to time, not consistently. I am using OVN.
> Typically, it takes no more than a few seconds for neutron to confirm the port is up.
> The default timeout in my setup is 600s. Even the ports shows up in both OVN SB
> and NB, nova-compute still didn't get confirmation from neutron. Either neutron
> didn't pick it up or the message was lost and didn't get to nova-compute.
> Hoping someone could share more thoughts.
That also may be a super-set of the revert-resize with OVS hybrid plug
issue described in . Even though the problems described in the topic
may have nothing to that particular case, but does look related to the
external events framework.
Issues like that make me thinking about some improvements to it.
[tl;dr] bring back up the idea of buffering events with a ttl
Like a new deferred RPC calls feature maybe? That would execute a call
after some trigger, like send unplug and forget. That would make
debugging harder, but cover the cases when an external service "forgot"
(an event was lost and the like cases) to notify Nova when it is done.
Adding a queue to store events that Nova did not have a recieve handler
set for might help as well. And have a TTL set on it, or a more advanced
reaping logic, for example based on tombstone events invalidating the
queue contents by causal conditions. That would eliminate flaky
expectations set around starting to wait for receiving events vs sending
unexpected or belated events. Why flaky? Because in an async distributed
system there is no "before" nor "after", so an external to Nova service
will unlikely conform to any time-frame based contract for
send-notify/wait-receive/real-completion-fact. And the fact that Nova
can't tell what the network backend is (because  was not fully
implemented) does not make things simpler.
As Sean noted in a private irc conversation, with OVN the current
implementation is not capable of fullfilling the contract that
network-vif-plugged events are only sent after the interface is fully
configred. So it send events at bind time once it have updated the
logical port in the ovn db but before real configuration has happened. I
believe that deferred RPC calls and/or queued events might improve such
a "cheating" by making the real post-completion processing a thing for
> From: Laurent Dumont <laurentfdumont at gmail.com>
> Sent: November 22, 2021 02:05 PM
> To: Michal Arbet
> Cc: openstack-discuss
> Subject: Re: [neutron][nova] [kolla] vif plugged timeout
> How high did you have to raise it? If it does appear after X amount of time, then the VIF plug is not lost?
> On Sat, Nov 20, 2021 at 7:23 AM Michal Arbet <michal.arbet at ultimum.io<mailto:michal.arbet at ultimum.io>> wrote:
> + if i raise vif_plugged_timeout ( hope i rember it correct ) in nova to some high number ..problem dissapear ... But it's only workaround
> Dňa so 20. 11. 2021, 12:05 Michal Arbet <michal.arbet at ultimum.io<mailto:michal.arbet at ultimum.io>> napísal(a):
> Has anyone seen issue which I am currently facing ?
> When launching heat stack ( but it's same if I launch several of instances ) vif plugged in timeouts an I don't know why, sometimes it is OK ..sometimes is failing.
> Sometimes neutron reports vif plugged in < 10 sec ( test env ) sometimes it's 100 and more seconds, it seems there is some race condition but I can't find out where the problem is. But on the end every instance is spawned ok (retry mechanism worked).
> Another finding is that it has to do something with security group, if noop driver is used ..everything is working good.
> Firewall security setup is openvswitch .
> Test env is wallaby.
> I will attach some logs when I will be near PC ..
> Thank you,
> Michal Arbet (Kevko)
More information about the openstack-discuss