[openstack-dev] [Neutron] OVS flow modification performance

IWAMOTO Toshihiro iwamoto at valinux.co.jp
Mon Jan 25 09:08:51 UTC 2016


At Thu, 21 Jan 2016 02:59:16 +0000,
Wuhongning wrote:
> 
> I don't think 400 flows can show the difference , do you have setup any tunnel peer?
> 
> In fact we may set the network type as "vxlan", then make a fake MD simulate sending l2pop fdb add messages, to push ten's of thousands flows into the testing ovs agent.

I chose this method because I didn't want to write such extra code for
measurements. ;)
Of course, I'd love to see data from other test environments and other
workload than agent restarts.

Also, we now have https://review.openstack.org/#/c/271939/ and can
profile neutron-server (and probably others, too).
I couldn't find non-trivial findings until now, though.

> ________________________________________
> From: IWAMOTO Toshihiro [iwamoto at valinux.co.jp]
> Sent: Monday, January 18, 2016 4:37 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Neutron] OVS flow modification performance
> 
> At Mon, 18 Jan 2016 00:42:32 -0500,
> Kevin Benton wrote:
> >
> > Thanks for doing this. A couple of questions:
> >
> > What were your rootwrap settings when running these tests? Did you just
> > have it calling sudo directly?
> 
> I used devstack's default, which runs root_helper_daemon.
> 
> > Also, you mention that this is only ~10% of the time spent during flow
> > reconfiguration. What other areas are eating up so much time?
> 
> 
> In another run,
> 
> $ for f in `cat tgidlist.n2`; do echo -n $f; opreport -n tgid:$f --merge tid|head -1|tr -d '\n'; (cd bg; opreport -n tgid:$f --merge tid|head -1);echo; done|sort -nr -k +2
> 10071   239058 100.000 python2.7    14922 100.000 python2.7
> 9995    92328 100.000 python2.7    11450 100.000 python2.7
> 7579    88202 100.000 python2.7    (18596)
> 11094    51560 100.000 python2.7    47964 100.000 python2.7
> 7035    49687 100.000 python2.7    40678 100.000 python2.7
> 11093    49380 100.000 python2.7    36004 100.000 python2.7
> (legend: <pid> <oprof count with an agent restart> <junk> <junk>
>          <background (oprof count without an agent restart)>)
> 
> These processes are neutron-server, nova-api,
> neutron-openvswitch-agent, nova-conductor, dstat and nova-conductor in
> a decending order.
> 
> So neutron-server uses about 3x CPU time than the ovs agent,
> nova-api's CPU usage is similar to the ovs agent's, and the others
> aren't probably significant.
> 
> > Cheers,
> > Kevin Benton
> >
> > On Sun, Jan 17, 2016 at 10:12 PM, IWAMOTO Toshihiro <iwamoto at valinux.co.jp>
> > wrote:
> >
> > > I'm sending out this mail to share the finding and discuss how to
> > > improve with those interested in neutron ovs performance.
> > >
> > > TL;DR: The native of_interface code, which has been merged recently
> > > and isn't default, seems to consume less CPU time but gives a mixed
> > > result.  I'm looking into this for improvement.
> > >
> > > * Introduction
> > >
> > > With an ML2+ovs Neutron configuration, openflow rule modification
> > > happens often and is somewhat a heavy operation as it involves
> > > exec() of the ovs-ofctl command.
> > >
> > > The native of_interface driver doesn't use the ovs-ofctl command and
> > > should have less performance impact on the system.  This document
> > > tries to confirm this hypothesis.
> > >
> > >
> > > * Method
> > >
> > > In order to focus on openflow rule operation time and avoid noise from
> > > other operations (VM boot-up, etc.), neutron-openvswitch-agent was
> > > restarted and the time it took to reconfigure the flows was measured.
> > >
> > > 1. Use devstack to start a test environment.  As debug logs generate
> > >    considable amount of load, ENABLE_DEBUG_LOG_LEVEL was set to false.
> > > 2. Apply https://review.openstack.org/#/c/267905/ to enable
> > >    measurement of flow reconfiguration times.
> > > 3. Boot 80 m1.nano instances.  In my setup, this generates 404 br-int
> > >    flows.  If you have >16G RAM, more could be booted.
> > > 4. Stop neutron-openvswitch-agent and restart with --run-once arg.
> > >    Use time, oprofile, and python's cProfile (use --profile arg) to
> > >    collect data.
> > >
> > > * Results
> > >
> > > Execution time (averages of 3 runs):
> > >
> > >             native     28.3s user 2.9s sys 0.4s
> > >             ovs-ofctl  25.7s user 2.2s sys 0.3s
> > >
> > > ovs-ofctl runs faster and seems to use less CPU, but the above doesn't
> > > count in execution time of ovs-ofctl.
> > >
> > > Oprofile data collected by running "operf -s -t" contain the
> > > information.
> > >
> > > With of_interface=native config, "opreport tgid:<pid of ovs agent>" shows:
> > >
> > >    samples|      %|
> > > ------------------
> > >     87408 100.000 python2.7
> > >         CPU_CLK_UNHALT...|
> > >           samples|      %|
> > >         ------------------
> > >             69160 79.1232 python2.7
> > >              8416  9.6284 vmlinux-3.13.0-24-generic
> > >
> > > and "opreport --merge tgid" doesn't show ovs-ofctl.
> > >
> > > With of_interface=ovs-ofctl, "opreport tgid:<pid of ovs agent>" shows:
> > >
> > >    samples|      %|
> > > ------------------
> > >     62771 100.000 python2.7
> > >         CPU_CLK_UNHALT...|
> > >           samples|      %|
> > >         ------------------
> > >             49418 78.7274 python2.7
> > >              6483 10.3280 vmlinux-3.13.0-24-generic
> > >
> > > and  "opreport --merge tgid" shows CPU consumption by ovs-ofctl
> > >
> > >     35774  3.5979 ovs-ofctl
> > >         CPU_CLK_UNHALT...|
> > >           samples|      %|
> > >         ------------------
> > >             28219 78.8813 vmlinux-3.13.0-24-generic
> > >              3487  9.7473 ld-2.19.so
> > >              2301  6.4320 ovs-ofctl
> > >
> > > Comparing 87408 (native python) with 62771+35774, the native
> > > of_interface uses 0.4s less CPU time overall.
> > >
> > > * Conclusion and future steps
> > >
> > > The native of_interface uses slightly less CPU time but takes longer
> > > time to complete a flow reconfiguration after an agent restart.
> > >
> > > As an OVS agent accounts for only 1/10th of total CPU usage during a
> > > flow reconfiguration (data not shown), there may be other areas for
> > > improvement.
> > >
> > > The cProfile Python module gives more fine grained data, but no
> > > apparent performance bottleneck was found.  The data show more
> > > eventlet context switches with the native of_interface, which is due
> > > to how the native of_interface is written.  I'm looking into for
> > > improving CPU usage and latency.
> > >
> > >
> > >
> > > __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> >
> >
> >
> > --
> > Kevin Benton
> > [1.2  <text/html; UTF-8 (quoted-printable)>]
> > [2  <text/plain; us-ascii (7bit)>]
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list