[openstack-dev] [neutron] trunk api performance and scale measurments

Tidwell, Ryan ryan.tidwell at hpe.com
Tue Dec 6 18:06:54 UTC 2016


I had been meaning to go a little deeper with performance benchmarking, but I’ve been crunched for time. Thanks for doing this, this is some great analysis.

As Armando mentioned, L2pop seemed to be the biggest impediment to control plane performance.  If I were to use trunks heavily in production I would consider A) not using overlays that would leverage L2pop (ie VLAN or flat segmentation) or B) disabling L2pop and dealing with the MAC learning overhead in the overlay.  Another consideration is the rpc_timeout setting, I tested with rpc_timeout=300 and I didn’t really encounter any rpc_timeouts. However, there may be other reasons operators may have for not bumping rpc_timeout up that high.

I failed to make much mention of it in previous write-ups, but I also encountered scale issues with listing ports after a certain threshold. I haven’t gone back to identify where the tipping point is, but I did notice that Horizon began to really bog down as I added ports to the system. On the surface it didn’t seem to matter whether these ports were used as subports or not, the sheer volume of ports added to the system seemed to cause both Horizon and more importantly GET on v2.0/ports to really bog down.


From: Armando M. [mailto:armamig at gmail.com]
Sent: Monday, December 05, 2016 8:37 AM
To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [neutron] trunk api performance and scale measurments

On 5 December 2016 at 08:07, Jay Pipes <jaypipes at gmail.com<mailto:jaypipes at gmail.com>> wrote:
On 12/05/2016 10:59 AM, Bence Romsics wrote:

I measured how the new trunk API scales with lots of subports. You can
find the results here:


Hope you find it useful. There are several open ends, let me know if
you're interested in following up some of them.

Great info in there, Ben, thanks very much for sharing!


Thanks for the wealth of information provided, I was looking forward to it! The results of the experimentation campaign makes me somewhat confident that trunk feature design is solid, or at least that is what it looks like! I'll look into why there is a penalty on port-list, because that's surprising for me too.

I also know that the QE team internally at HPE has done some perf testing (though I don't have results publicly available yet), but what I can share at this point is:

  *   They also disabled l2pop to push the boundaries of trunk deployments;
  *   They disabled OVS firewall (though for reasons orthogonal to scalability limits introduced by the functionality);
  *   They flipped back to ovsctl interface, as it turned out to be one of components that introduced some penalty. Since you use native interface, it'd be nice to see what happens if you flipped this switch too;
  *   RPC timeout of 300.
On a testbed of 3 controllers and 7 computes, this is at high level what they found out the following:

  *   100 trunks, with 1900 subports took about 30 mins with no errors;
  *   500 subports take about 1 min to bind to a trunk;
  *   Booting a VM on trunk with 100 subports takes as little as 15 seconds to successful ping. Trunk goes from BUILD -> ACTIVE within 60 seconds of booting the VM;
  *   Scaling to 700 VM's incrementally on trunks with 100 initial subports is constant (e.g. booting time stays set at ~15 seconds).
I believe Ryan Tidwell may have more on this.



OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20161206/4fb322ed/attachment.html>

More information about the OpenStack-dev mailing list