[Openstack-operators] Intel 10 GbE / bonding issue with hash policy layer3+4

Sascha Vogt sascha.vogt at gmail.com
Tue Mar 29 10:51:46 UTC 2016


Hi Tom,

first of all, thanks for your response and sorry for my late response :)
Easter (+ a bit of relax time) got in between.

Am 23.03.2016 um 16:45 schrieb MailingLists - EWS:
> What version of the ixgbe driver are you using? Is it the same on both
> kernels? Have you tried the latest "out of tree driver" from E1000 to see if
> the issue goes away?
We haven't tried any other driver, so its the one which ships with the
respective Kernel of Ubuntu (which I guess is the unpatched version for
that Kernel, though I haven't checked that manually. Is there an easy
way to spot the version?

Kernels are:
- 3.16.0-67-generic #87~14.04.1-Ubuntu (uname output)
- 4.2.0-34.39~14.04.1 ('aptitude show' output, as we have currently no
server running with that Kernel)

> I follow the E1000 mailing list and I seem to recall some rather recent
> posts regarding bonding and the ixgbe along with some patches being applied
> to the driver, however I don't know what version of kernel these issues were
> on, or even if the patches were accepted.
> 
> https://sourceforge.net/p/e1000/mailman/e1000-devel/thread/87618083B2453E4A8
> 714035B62D67992504FB5FF%40FMSMSX105.amr.corp.intel.com/#msg34727125
> 
> Something about a timing issue with detecting the slave's link speed and
> passing that information to the bonding driver in a timely fashion.
> 

Hm, it might be related, but at least we're not seeing the LACP going up
and down constantly. Also the link speed is correctly reported. And the
Link Failure Count is steady at 1 (and is the same for both kernels).
Also on the switch we don't see anything different (like links going up
and down, etc).

My guess up to this point is really something odd with the hash calculation.

Greetings
-Sascha-




More information about the OpenStack-operators mailing list