[ops] Any experience with DELL M640 and BCM57840 network card ?
Hi Did someone try using DELL M640 with BCM57840 network card as compute nodes ? I am asking because we are adding a new compute node with such hw configuration to our OpenStack Ocata installation (with CentOS 7.6). Each tiime we create a VM on this compute node, the compute node crashes (bnx2x_panic_dump). Not clear which specific event triggers the problem. It doesn't seem a hw problem (e.g. iperf works without problems). We are using the same network card, but on different servers (e.g. on some M630 systems). If we disable the tx-checksum-ipv4 on the data network interface (i.e. "ethtool -K em3 tx-checksum-ipv4 off") we don't see anymore the problem. Thanks, Massimo
I totally ran into this issue here. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1643558 It seems another contributor ran into this. Are you deploying using Kolla? On Tue, Mar 26, 2019 at 1:29 PM Massimo Sgaravatto <massimo.sgaravatto@gmail.com> wrote:
Hi
Did someone try using DELL M640 with BCM57840 network card as compute nodes ?
I am asking because we are adding a new compute node with such hw configuration to our OpenStack Ocata installation (with CentOS 7.6).
Each tiime we create a VM on this compute node, the compute node crashes (bnx2x_panic_dump). Not clear which specific event triggers the problem.
It doesn't seem a hw problem (e.g. iperf works without problems).
We are using the same network card, but on different servers (e.g. on some M630 systems).
If we disable the tx-checksum-ipv4 on the data network interface (i.e. "ethtool -K em3 tx-checksum-ipv4 off") we don't see anymore the problem.
Thanks, Massimo
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
No, we are not using Kolla. We use the CentOS rpms to deploy We contacted DELL support for this problem. Since iperf works without problems, they suggest there is something 'wrong' (or at least incompatible with that hw) in the used software (i.e OpenStack and/or related dependencies). As I said disabling tx-checksum-ipv4 I don't see anymore the problem, but the DELL support guy claims that this is not safe because checksumming is not done. I thought that disabling tx-checksum-ipv4 simply means that the checksum is done in software instead of hardware (so just a matter of performance), Am I wrong ? Thanks, Massimo On Tue, Mar 26, 2019 at 7:45 PM Mohammed Naser <mnaser@vexxhost.com> wrote:
I totally ran into this issue here.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1643558
It seems another contributor ran into this. Are you deploying using Kolla?
On Tue, Mar 26, 2019 at 1:29 PM Massimo Sgaravatto <massimo.sgaravatto@gmail.com> wrote:
Hi
Did someone try using DELL M640 with BCM57840 network card as compute
nodes ?
I am asking because we are adding a new compute node with such hw
configuration to our OpenStack Ocata installation (with CentOS 7.6).
Each tiime we create a VM on this compute node, the compute node crashes
(bnx2x_panic_dump).
Not clear which specific event triggers the problem.
It doesn't seem a hw problem (e.g. iperf works without problems).
We are using the same network card, but on different servers (e.g. on some M630 systems).
If we disable the tx-checksum-ipv4 on the data network interface (i.e. "ethtool -K em3 tx-checksum-ipv4 off") we don't see anymore the problem.
Thanks, Massimo
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
On 03/27/2019 01:35 AM, Massimo Sgaravatto wrote:
No, we are not using Kolla. We use the CentOS rpms to deploy
We contacted DELL support for this problem. Since iperf works without problems, they suggest there is something 'wrong' (or at least incompatible with that hw) in the used software (i.e OpenStack and/or related dependencies).
As I said disabling tx-checksum-ipv4 I don't see anymore the problem, but the DELL support guy claims that this is not safe because checksumming is not done.
I thought that disabling tx-checksum-ipv4simply means that the checksum is done in software instead of hardware (so just a matter of performance), Am I wrong ?
No, you are correct. Whoever you spoke to at Dell didn't know what they were talking about. -jay
On Wed, Mar 27, 2019 at 3:34 PM Jay Pipes <jaypipes@gmail.com> wrote:
On 03/27/2019 01:35 AM, Massimo Sgaravatto wrote:
No, we are not using Kolla. We use the CentOS rpms to deploy
We contacted DELL support for this problem. Since iperf works without problems, they suggest there is something 'wrong' (or at least incompatible with that hw) in the used software (i.e OpenStack and/or related dependencies).
As I said disabling tx-checksum-ipv4 I don't see anymore the problem, but the DELL support guy claims that this is not safe because checksumming is not done.
I thought that disabling tx-checksum-ipv4simply means that the checksum is done in software instead of hardware (so just a matter of performance), Am I wrong ?
No, you are correct. Whoever you spoke to at Dell didn't know what they were talking about.
+1 we found out that the bad machines had the same firmware for the NIC and got new ones shipped and much joy afterwards.
-jay
-- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser@vexxhost.com W. http://vexxhost.com
participants (3)
-
Jay Pipes
-
Massimo Sgaravatto
-
Mohammed Naser