<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div><div>Hi all,<br><br></div>Let me introduce our experiment's result:<br><br></div>First we write an patch: <a href="https://review.openstack.org/#/c/131791/">https://review.openstack.org/#/c/131791/</a>, and tried to use it in an experiment environment.<br><br></div>Bad things happened:<br><br></div>1. Note that this is the old flows (Network node's br-tun, the previous version is about icehouse):<br>"cookie=0x0, duration=238379.566s, table=1, n_packets=373521, n_bytes=26981817, idle_age=0, hard_age=65534, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,21)<br>"cookie=0x0, duration=238379.575s, table=1, n_packets=30101, n_bytes=3603857, idle_age=198, hard_age=65534, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)<br>"cookie=0x0, duration=238379.530s, table=20, n_packets=4957, n_bytes=631543, idle_age=198, hard_age=65534, priority=0 actions=resubmit(,21)"<br></div><div>If the packet is a broadcast packet, we will resubmit it to table 20, and table 20 will do nothing but resubmit to table 21.<br>the full sequence is:<br>from vxlan ports?: table 0 -> table 3 -> table 10 (learn flows and insert to table 20)<br></div><div>from br-int?: table 0 -> table 1 -> (table 20) -> table 21<br></div><div><br></div></div>In the new version (about to juno), we discard table 1, use table 2 instead:<br>"cookie=0x0, duration=142084.354s, table=2, n_packets=175823, n_bytes=12323286, idle_age=0, hard_age=65534, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)<br>"cookie=0x0, duration=142084.364s, table=2, n_packets=861601, n_bytes=107499857, idle_age=0, hard_age=65534, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)"<br></div>But if haven't remove all old flows, the table 1 will still exists, and it will intercept packets, and try to submit packets to table 21 and 20, which the correct tables are 22 and 20.<br></div>the full sequence is:<br></div>from vxlan ports?: table 0 -> table 4 -> table 10<br></div>from br-int?: table 0 -> table 2 -> (table 20, maybe output then!) -> table 22<br><br></div>Let's image we mix these up, because priority is 1 to table 0's flows, so we can't make sure packets will trans to right flow, so some packets may submit to table 21, this is quite beyond the pale!<br><br></div>2. What's more, let's imagine if we both use vxlan and vlan as provider:<br><span style="font-family:georgia,serif"> +-----------------+ <br> | | <br> | namespace | +------------+ <br> | +---+--------+ | | | <br> | | qg-xxxx | | | namespace | <br> | | | | | | <br> | +------------+ | | +--------+ | <br> | | | | tap | | <br> | +------------+ | | +--------+ | <br> | | qr xxxxx | | | | <br> | +------------+ | +------+-----+ <br> | | | <br> +-----------++----+ | <br> || | <br> +-++--------------------+---+ <br> | | <br>+---------------+ | | +-------------------+<br>| | | br-int | | |<br>| ovs-br vlan +-----------+ +------------------+ br-tun(vxlan) |<br>| | | | | |<br>+-------+-------+ | | +---------+---------+<br> | +-----------------------+ | <br> | | <br> | | <br> | +---------------------+ | <br> | | | | <br> | | +-----------------------+ <br> +--------------------------+ | <br> | eth0(ethernet card) | <br> | | <br> | | <br> +---------------------+ <br></span></div>since ovs-br's vlan is assigned as x, this will mod to y to br-int, but y is assigned by ovs, not our config, so there may exist more than one mod flow for vlan packet in ovs-br,<br></div>this will cause vlan_id falsify! And may cause network loop!<br><br></div>The above accidents are what happened our experiment, not only my imagine.<br><br></div>Please take more caution in design!<br><br></div>Please feel free to contact me with this email address and welcome to comments.<br><br></div>Damon Wang<br></div><div class="gmail_extra"><br><div class="gmail_quote">2014-11-06 2:59 GMT+08:00 Armando M. <span dir="ltr"><<a href="mailto:armamig@gmail.com" target="_blank">armamig@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I would be open to making this toggle switch available, however I feel that doing it via static configuration can introduce unnecessary burden to the operator. Perhaps we could explore a way where the agent can figure which state it's supposed to be in based on its reported status?<span class="HOEnZb"><font color="#888888"><div><br></div><div>Armando </div></font></span><div><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On 5 November 2014 12:09, Salvatore Orlando <span dir="ltr"><<a href="mailto:sorlando@nicira.com" target="_blank">sorlando@nicira.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I have no opposition to that, and I will be happy to assist reviewing the code that will enable flow synchronisation (or to say it in an easier way, punctual removal of flows unknown to the l2 agent). <div><br></div><div>In the meanwhile, I hope you won't mind if we go ahead and start making flow reset optional - so that we stop causing downtime upon agent restart.</div><span><font color="#888888"><div><br></div><div>Salvatore</div></font></span></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On 5 November 2014 11:57, Erik Moe <span dir="ltr"><<a href="mailto:erik.moe@ericsson.com" target="_blank">erik.moe@ericsson.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I also agree, IMHO we need flow synchronization method so we can avoid network downtime and stray flows.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Regards,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Erik<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Germy Lure [mailto:<a href="mailto:germy.lure@gmail.com" target="_blank">germy.lure@gmail.com</a>]
<br>
<b>Sent:</b> den 5 november 2014 10:46<span><br>
<b>To:</b> OpenStack Development Mailing List (not for usage questions)<br>
</span><b>Subject:</b> Re: [openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?<u></u><u></u></span></p><div><div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Hi Salvatore,<u></u><u></u></p>
<div>
<p class="MsoNormal">A startup flag is really a simpler approach. But in what situation we should set this flag to remove all flows? upgrade? restart manually? internal fault?<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Indeed, only at the time that there are inconsistent(incorrect, unwanted, stable and so on) flows between agent and the ovs related, we need refresh flows. But the problem is how we know this? I think a startup flag is too rough, unless
we can tolerate the inconsistent situation.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Of course, I believe that turn off startup reset flows action can resolve most problem. The flows are correct most time after all. But considering NFV 5 9s, I still recommend <span style="font-size:9.5pt;font-family:"Arial","sans-serif"">flow
synchronization approach.</span><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">BR,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Germy<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando <<a href="mailto:sorlando@nicira.com" target="_blank">sorlando@nicira.com</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal">From what I gather from this thread and related bug report, the change introduced in the OVS agent is causing a data plane outage upon agent restart, which is not desirable in most cases.<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The rationale for the change that introduced this bug was, I believe, cleaning up stale flows on the OVS agent, which also makes some sense.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Unless I'm missing something, I reckon the best way forward is actually quite straightforward; we might add a startup flag to reset all flows and not reset them by default.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">While I agree the "flow synchronisation" process proposed in the previous post is valuable too, I hope we might be able to fix this with a simpler approach.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#888888"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span style="color:#888888">Salvatore<u></u><u></u></span></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On 5 November 2014 04:43, Germy Lure <<a href="mailto:germy.lure@gmail.com" target="_blank">germy.lure@gmail.com</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal">Hi,<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Consider the triggering of restart agent, I think it's nothing but:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">1). only restart agent<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">2). reboot the host that agent deployed on<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">When the agent started, the ovs may:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">a.have all correct flows<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">b.have nothing at all<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">c.have partly correct flows, the others may need to be reprogrammed, deleted or added<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">In any case, I think both user and developer would happy to see that the system recovery ASAP after agent restarting. The best is agent only push those incorrect flows, but keep the correct ones. This can ensure those business with correct
flows working during agent starting.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">So, I suggest two solutions:<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">1.Agent gets all flows from ovs and compare with its local flows after restarting. And agent only corrects the different ones.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">2.Adapt ovs and agent. Agent just push all(not remove) flows every time and ovs prepares two tables for flows switch(like RCU lock).<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">1 is recommended because of the 3rd vendors.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">BR,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Germy<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec <<a href="mailto:openstack@nemebean.com" target="_blank">openstack@nemebean.com</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal">On 10/29/2014 10:17 AM, Kyle Mestery wrote:<br>
> On Wed, Oct 29, 2014 at 7:25 AM, Hly <<a href="mailto:henry4hly@gmail.com" target="_blank">henry4hly@gmail.com</a>> wrote:<br>
>><br>
>><br>
>> Sent from my iPad<br>
>><br>
>> On 2014-10-29, at <span style="font-family:"MS Gothic"">下午</span>8:01, Robert van Leeuwen <<a href="mailto:Robert.vanLeeuwen@spilgames.com" target="_blank">Robert.vanLeeuwen@spilgames.com</a>> wrote:<br>
>><br>
>>>>> I find our current design is remove all flows then add flow by entry, this<br>
>>>>> will cause every network node will break off all tunnels between other<br>
>>>>> network node and all compute node.<br>
>>>> Perhaps a way around this would be to add a flag on agent startup<br>
>>>> which would have it skip reprogramming flows. This could be used for<br>
>>>> the upgrade case.<br>
>>><br>
>>> I hit the same issue last week and filed a bug here:<br>
>>> <a href="https://bugs.launchpad.net/neutron/+bug/1383674" target="_blank">https://bugs.launchpad.net/neutron/+bug/1383674</a><br>
>>><br>
>>> From an operators perspective this is VERY annoying since you also cannot push any config changes that requires/triggers a restart of the agent.<br>
>>> e.g. something simple like changing a log setting becomes a hassle.<br>
>>> I would prefer the default behaviour to be to not clear the flows or at the least an config option to disable it.<br>
>>><br>
>><br>
>> +1, we also suffered from this even when a very little patch is done<br>
>><br>
> I'd really like to get some input from the tripleo folks, because they<br>
> were the ones who filed the original bug here and were hit by the<br>
> agent NOT reprogramming flows on agent restart. It does seem fairly<br>
> obvious that adding an option around this would be a good way forward,<br>
> however.<br>
<br>
Since nobody else has commented, I'll put in my two cents (though I<br>
might be overcharging you ;-). I've also added the TripleO tag to the<br>
subject, although with Summit coming up I don't know if that will help.<br>
<br>
Anyway, if the bug you're referring to is the one I think, then our<br>
issue was just with the flows not existing. I don't think we care<br>
whether they get reprogrammed on agent restart or not as long as they<br>
somehow come into existence at some point.<br>
<br>
It's possible I'm wrong about that, and probably the best person to talk<br>
to would be Robert Collins since I think he's the one who actually<br>
tracked down the problem in the first place.<br>
<span style="color:#888888"><br>
-Ben</span><u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><br>
<br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><u></u><u></u></p>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><u></u><u></u></p>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div></div></div>
</div>
<br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div></div></div></div>
<br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br></div>