<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 31, 2014 at 3:28 PM, Ben Nemec <span dir="ltr"><<a href="mailto:openstack@nemebean.com" target="_blank">openstack@nemebean.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 10/29/2014 10:17 AM, Kyle Mestery wrote:<br>
> On Wed, Oct 29, 2014 at 7:25 AM, Hly <<a href="mailto:henry4hly@gmail.com" target="_blank">henry4hly@gmail.com</a>> wrote:<br>
>><br>
>><br>
>> Sent from my iPad<br>
>><br>
>> On 2014-10-29, at 下午8:01, Robert van Leeuwen <<a href="mailto:Robert.vanLeeuwen@spilgames.com" target="_blank">Robert.vanLeeuwen@spilgames.com</a>> wrote:<br>
>><br>
>>>>> I find our current design is remove all flows then add flow by entry, this<br>
>>>>> will cause every network node will break off all tunnels between other<br>
>>>>> network node and all compute node.<br>
>>>> Perhaps a way around this would be to add a flag on agent startup<br>
>>>> which would have it skip reprogramming flows. This could be used for<br>
>>>> the upgrade case.<br>
>>><br>
>>> I hit the same issue last week and filed a bug here:<br>
>>> <a href="https://bugs.launchpad.net/neutron/+bug/1383674" target="_blank">https://bugs.launchpad.net/neutron/+bug/1383674</a><br>
>>><br>
>>> From an operators perspective this is VERY annoying since you also cannot push any config changes that requires/triggers a restart of the agent.<br>
>>> e.g. something simple like changing a log setting becomes a hassle.<br>
>>> I would prefer the default behaviour to be to not clear the flows or at the least an config option to disable it.<br>
>>><br>
>><br>
>> +1, we also suffered from this even when a very little patch is done<br>
>><br>
> I'd really like to get some input from the tripleo folks, because they<br>
> were the ones who filed the original bug here and were hit by the<br>
> agent NOT reprogramming flows on agent restart. It does seem fairly<br>
> obvious that adding an option around this would be a good way forward,<br>
> however.<br>
<br>
</span>Since nobody else has commented, I'll put in my two cents (though I<br>
might be overcharging you ;-). I've also added the TripleO tag to the<br>
subject, although with Summit coming up I don't know if that will help.<br></blockquote><div><br></div><div>Summit did lead to some delays - I started this response and then got distracted, and only just found the draft again </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Anyway, if the bug you're referring to is the one I think, then our<br>
issue was just with the flows not existing. I don't think we care<br>
whether they get reprogrammed on agent restart or not as long as they<br>
somehow come into existence at some point.<br></blockquote><div><br></div><div>Is <a href="https://bugs.launchpad.net/bugs/1290486" target="_blank">https://bugs.launchpad.net/bugs/1290486</a> the bug in you'rethinking of?</div><div><br></div><div>That seems to have been solved with <a href="https://review.openstack.org/#/c/96919/" target="_blank">https://review.openstack.org/#/c/96919/</a></div><div><br></div><div>My memory of that problem is that prior to 96919, when the daemon was restarted, existing flows were thrown away. We'd end up with just a NORMAL flow, which didn't route the traffic where we need it.<br><br>The fix implemented there seems to have been to implement a canary rule to detect when this happens - ie, detect that all the existing flows had been thrown away. Once we know they've been thrown away, we know we need to recreate the flows that were thrown away when the daemon restarted.<br><br>If my memory is correct (and it may not be, I'm not 100% sure I fully understood the problem at the time), the root cause here is not the change added in 96919 - by the time that code is triggered and the flows are reprogrammed, they've already been lost.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
It's possible I'm wrong about that, and probably the best person to talk<br>
to would be Robert Collins since I think he's the one who actually<br>
tracked down the problem in the first place.<br></blockquote><div><br></div><div>I think (if I'm looking at the right bug) that you're referring to his comment:</div><div><br></div></div></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div class="gmail_extra"><div class="gmail_quote"><div><span style="color:rgb(51,51,51);font-family:monospace;font-size:12px;line-height:18px">we're trying to do things before ovs-db is up and running and neutron-</span><span style="color:rgb(51,51,51);font-family:monospace;font-size:12px;line-height:18px">openvswitch-</span><span style="color:rgb(51,51,51);font-family:monospace;font-size:12px;line-height:18px">agent is not handling ovs-db being down properly - it should back off and retry, or alternatively, do a full sync once the db is available.<br></span></div></div></div></blockquote><div class="gmail_extra"><div class="gmail_quote"><div><br>As far as I can tell, everything after that point (ie, once I got involved) focused on the latter, which is why we ended up with the canary and the reprogramming. Assuming he's right about the race condition, it sounds as though fixing that might be preferable. Later discussion on this thread has centered around a full flow-synchornization approach: it sounds to me as though handling the db being unavailable will need to be part of that approach (we don't want to synchronize towards "no rules" just because we can't get a canonical list of rules from the DB)</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span><font color="#888888"><br>
-Ben<br>
</font></span><div><div><br>
<br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br></div></div>