[Openstack-operators] Liberty and OVS Agent restarts

Matt Kassawara mkassawara at gmail.com
Fri Feb 12 14:27:58 UTC 2016


Out of curiosity, what do you have for the "external_network_bridge" option
in the L3 agent config?

On Wed, Feb 10, 2016 at 2:42 PM, Bajin, Joseph <jbajin at verisign.com> wrote:

> Clayton,
>
> This is really good information.
>
> I’m wondering how we can help support you and get the necessary dev
> support to get this resolved sooner than later. I totally agree with you
> that this should be backported to at least Liberty.
>
> Please let me know how I and other can help!
>
> —Joe
>
>
>
>
>
>
>
>
>
> On 2/10/16, 8:55 AM, "Clayton O'Neill" <clayton at oneill.net> wrote:
>
> >Summary: Liberty OVS agent restarts are better, but still need work.
> >See: https://bugs.launchpad.net/neutron/+bug/1514056
> >
> >As many of you know, Liberty has a fix for OVS agent restarts such
> >that it doesn’t dump all flows when starting, resulting in a loss of
> >traffic.  Unfortunately, Liberty neutron still has issues with OVS
> >agent restarts.  The fix that went into Liberty prevents it from
> >dropping flows on the br-tun and br-int bridges and that helps
> >greatly, but the br-ex bridge still has it’s flows cleared on startup.
> >
> >You may be thinking: Wait, br-ex only has like 3 flows on it, how can
> >that be a problem?  The issue appears to be that the br-ex flows are
> >cleared early and not setup again until late in the process.  This
> >means that routers on the node where OVS agent is lose network
> >connectivity for the majority of the restart time.
> >
> >I did some testing with this yesterday, comparing a few scenarios with
> >100 FIPS, 100 instances and various scenarios for routers.  You can
> >find the the complete data here:
> >
> https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
> >
> >The summary looks like this:
> >100 routers, 100 networks, 100 floating ips, 100 instances, single node
> test:
> >Kilo average outage time: 47 seconds
> >Liberty average outage time: 37 seconds
> >
> >1 router, 1 network, 100 floating ips, 100 instances, single node test:
> >Kilo average outage time: 46 seconds
> >Liberty average outage time: 13 seconds
> >
> >1 router, 1 network, 100 floating its, 100 instances, router on a
> >separate node, all instances on a single node, OVS restart on compute
> >node:
> >Kilo average outage time: 25 seconds
> >Liberty average outage time: 0 to 1 seconds
> >
> >I did my testing using 1 second pings using fping to all of the
> >floating IPs.  With the last test, it frequently lost no packets, and
> >as a result I was not really able to test the scenario other than to
> >qualify it as good.
> >
> >This is a huge operational issue for us and I suspect for many of the
> >rest of you using OVS.  I’d encourage everyone that is using OVS to
> >register interest in having this fixed in the LP bug
> >(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
> >as marked as low priority.
> >
> >_______________________________________________
> >OpenStack-operators mailing list
> >OpenStack-operators at lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160212/ec5e0466/attachment.html>


More information about the OpenStack-operators mailing list