[openstack-dev] [neutron][TripleO] Clear all flows when ovs agent start? why and how avoid?

Armando M. armamig at gmail.com
Wed Nov 5 18:59:08 UTC 2014


I would be open to making this toggle switch available, however I feel that
doing it via static configuration can introduce unnecessary burden to the
operator. Perhaps we could explore a way where the agent can figure which
state it's supposed to be in based on its reported status?

Armando

On 5 November 2014 12:09, Salvatore Orlando <sorlando at nicira.com> wrote:

> I have no opposition to that, and I will be happy to assist reviewing the
> code that will enable flow synchronisation  (or to say it in an easier way,
> punctual removal of flows unknown to the l2 agent).
>
> In the meanwhile, I hope you won't mind if we go ahead and start making
> flow reset optional - so that we stop causing downtime upon agent restart.
>
> Salvatore
>
> On 5 November 2014 11:57, Erik Moe <erik.moe at ericsson.com> wrote:
>
>>
>>
>> Hi,
>>
>>
>>
>> I also agree, IMHO we need flow synchronization method so we can avoid
>> network downtime and stray flows.
>>
>>
>>
>> Regards,
>>
>> Erik
>>
>>
>>
>>
>>
>> *From:* Germy Lure [mailto:germy.lure at gmail.com]
>> *Sent:* den 5 november 2014 10:46
>> *To:* OpenStack Development Mailing List (not for usage questions)
>> *Subject:* Re: [openstack-dev] [neutron][TripleO] Clear all flows when
>> ovs agent start? why and how avoid?
>>
>>
>>
>> Hi Salvatore,
>>
>> A startup flag is really a simpler approach. But in what situation we
>> should set this flag to remove all flows? upgrade? restart manually?
>> internal fault?
>>
>>
>>
>> Indeed, only at the time that there are inconsistent(incorrect, unwanted,
>> stable and so on) flows between agent and the ovs related, we need refresh
>> flows. But the problem is how we know this? I think a startup flag is too
>> rough, unless we can tolerate the inconsistent situation.
>>
>>
>>
>> Of course, I believe that turn off startup reset flows action can resolve
>> most problem. The flows are correct most time after all. But considering
>> NFV 5 9s, I still recommend flow synchronization approach.
>>
>>
>>
>> BR,
>>
>> Germy
>>
>>
>>
>> On Wed, Nov 5, 2014 at 3:36 PM, Salvatore Orlando <sorlando at nicira.com>
>> wrote:
>>
>> From what I gather from this thread and related bug report, the change
>> introduced in the OVS agent is causing a data plane outage upon agent
>> restart, which is not desirable in most cases.
>>
>>
>>
>> The rationale for the change that introduced this bug was, I believe,
>> cleaning up stale flows on the OVS agent, which also makes some sense.
>>
>>
>>
>> Unless I'm missing something, I reckon the best way forward is actually
>> quite straightforward; we might add a startup flag to reset all flows and
>> not reset them by default.
>>
>> While I agree the "flow synchronisation" process proposed in the previous
>> post is valuable too, I hope we might be able to fix this with a simpler
>> approach.
>>
>>
>>
>> Salvatore
>>
>>
>>
>> On 5 November 2014 04:43, Germy Lure <germy.lure at gmail.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Consider the triggering of restart agent, I think it's nothing but:
>>
>> 1). only restart agent
>>
>> 2). reboot the host that agent deployed on
>>
>>
>>
>> When the agent started, the ovs may:
>>
>> a.have all correct flows
>>
>> b.have nothing at all
>>
>> c.have partly correct flows, the others may need to be reprogrammed,
>> deleted or added
>>
>>
>>
>> In any case, I think both user and developer would happy to see that the
>> system recovery ASAP after agent restarting. The best is agent only push
>> those incorrect flows, but keep the correct ones. This can ensure those
>> business with correct flows working during agent starting.
>>
>>
>>
>> So, I suggest two solutions:
>>
>> 1.Agent gets all flows from ovs and compare with its local flows after
>> restarting. And agent only corrects the different ones.
>>
>> 2.Adapt ovs and agent. Agent just push all(not remove) flows every time
>> and ovs prepares two tables for flows switch(like RCU lock).
>>
>>
>>
>> 1 is recommended because of the 3rd vendors.
>>
>>
>>
>> BR,
>>
>> Germy
>>
>>
>>
>>
>>
>> On Fri, Oct 31, 2014 at 10:28 PM, Ben Nemec <openstack at nemebean.com>
>> wrote:
>>
>> On 10/29/2014 10:17 AM, Kyle Mestery wrote:
>> > On Wed, Oct 29, 2014 at 7:25 AM, Hly <henry4hly at gmail.com> wrote:
>> >>
>> >>
>> >> Sent from my iPad
>> >>
>> >> On 2014-10-29, at 下午8:01, Robert van Leeuwen <
>> Robert.vanLeeuwen at spilgames.com> wrote:
>> >>
>> >>>>> I find our current design is remove all flows then add flow by
>> entry, this
>> >>>>> will cause every network node will break off all tunnels between
>> other
>> >>>>> network node and all compute node.
>> >>>> Perhaps a way around this would be to add a flag on agent startup
>> >>>> which would have it skip reprogramming flows. This could be used for
>> >>>> the upgrade case.
>> >>>
>> >>> I hit the same issue last week and filed a bug here:
>> >>> https://bugs.launchpad.net/neutron/+bug/1383674
>> >>>
>> >>> From an operators perspective this is VERY annoying since you also
>> cannot push any config changes that requires/triggers a restart of the
>> agent.
>> >>> e.g. something simple like changing a log setting becomes a hassle.
>> >>> I would prefer the default behaviour to be to not clear the flows or
>> at the least an config option to disable it.
>> >>>
>> >>
>> >> +1, we also suffered from this even when a very little patch is done
>> >>
>> > I'd really like to get some input from the tripleo folks, because they
>> > were the ones who filed the original bug here and were hit by the
>> > agent NOT reprogramming flows on agent restart. It does seem fairly
>> > obvious that adding an option around this would be a good way forward,
>> > however.
>>
>> Since nobody else has commented, I'll put in my two cents (though I
>> might be overcharging you ;-).  I've also added the TripleO tag to the
>> subject, although with Summit coming up I don't know if that will help.
>>
>> Anyway, if the bug you're referring to is the one I think, then our
>> issue was just with the flows not existing.  I don't think we care
>> whether they get reprogrammed on agent restart or not as long as they
>> somehow come into existence at some point.
>>
>> It's possible I'm wrong about that, and probably the best person to talk
>> to would be Robert Collins since I think he's the one who actually
>> tracked down the problem in the first place.
>>
>> -Ben
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141105/d8a684aa/attachment.html>


More information about the OpenStack-dev mailing list