[openstack-dev] [Neutron] Being more aggressive with our defaults

Monty Taylor mordred at inaugust.com
Mon Feb 8 16:31:14 UTC 2016

On 02/08/2016 09:47 AM, Sean M. Collins wrote:
> Hi,
> With the path_mtu issue - our default was to set path_mtu to zero, and
> do no calculation against the physical segment MTU and the overhead for
> the tunneling protocol that was selected for a tenant network. Which
> means the network would break.
> I am working on patches to change our behavior to set the MTU to 1500 by
> default[1], so that at least our out of the box experience is more
> sensible.
> This brings me to the csum feature of recent linux kernel versions and
> related network components.
> Patches:
> https://review.openstack.org/#/c/220744/
> https://review.openstack.org/#/c/261409/
> Bugs/RFEs:
> https://bugs.launchpad.net/neutron/+bug/1515069
> https://bugs.launchpad.net/neutron/+bug/1492111
> Basically, we see that enabling the csum feature creates the conditions
> where 10gig link were able to be fully utilized[2] in one instance[3]. My
> thinking is - yes I too would like to fully utilize the links that I
> paid good money for. Someone with more knowledge can correct me
> , but is there any reason not to enable this feature? If your hardware
> supports it, we should utilize it. If your hardware doesn't support it,
> then we shouldn't.
> tl;dr - why do we keep merging features that create more knobs that
> deployers and deployment tools need to keep turning? The fact that we
> merge features that are disabled by default means that they are not as
> thoroughly tested as features that are enabled by default.
> Neutron should have a lot of things enabled by default that improve
> performance (l2pop? path_mtu? dvr?), and by itself, try and enable these
> features. If for some reason the hardware doesn't support it, log that
> it wasn't successful and then disable.


There should not be an option labeled "go-fast" ... the only reason to 
have an option at all is if there is a valid reason for turning it off 
(like cards that have buggy checksums that you need to disable/ignore 
hardware side), and the only reason to leave such an option defaulting 
in the slow position is if the failure mode is one that can't be 
adequately tested at runtime and where failure could lead to 
corruption/data loss.

> OK - that's it for me. Thanks for reading. I'll put on my asbestos
> undies now.
> [1]: https://review.openstack.org/#/c/276411/
> [2]: http://openvswitch.org/pipermail/dev/2015-August/059335.html
> [3]: Yes, it's only one data point....

