[Openstack] Compute downloading corrupted image from Glance

Kaustubh Kelkar kaustubh.kelkar at casa-systems.com
Tue Mar 29 20:01:01 UTC 2016


Thanks for the reply, please find my responses inline.


-Kaustubh

-----Original Message-----
From: Rick Jones [mailto:rick.jones2 at hpe.com] 
Sent: Tuesday, March 29, 2016 1:43 PM
To: openstack at lists.openstack.org
Subject: Re: [Openstack] Compute downloading corrupted image from Glance

On 03/29/2016 10:17 AM, Kaustubh Kelkar wrote:
> Every time I tried to download the image on the compute, I get a new 
> hash value (albeit, a wrong one).

On the compute node, what is the type of NIC and its driver and such?
[Kaustubh] It is an Intel X710 NIC with i40e driver. The NIC is part of the integrated card on a Dell R730.

lscpi -v | grep -A 1 Ethernet
[Kaustubh] (Output redacted to show only the relevant interface)
01:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X710 Adapter (rev 01)
        Subsystem: Dell Device 0000

ethtool -i <interfacename>
[Kaustubh] root at dchi:/home/kkelkar# ethtool -i em2
driver: i40e
version: 1.4.25
firmware-version: 4.41 0x80001863 16.5.20
bus-info: 0000:01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
And are any of the stateless offloads enabled?

ethtool -k <interfacename>
[Kaustubh] root at dchi:/home/kkelkar# ethtool -k em2
Features for em2:
rx-checksumming: off
tx-checksumming: off
        tx-checksum-ipv4: off
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off
scatter-gather: off
        tx-scatter-gather: off
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: off
highdma: on
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Those would include checksum offload, and things built on top of it like TSO, GSO, LRO and/or GRO.

If you find that checksum offload is enabled, and you disable it, does the corrupt image download problem go away?  If so, you have a problem with your NIC and/or its driver getting the offloads wrong and/or corrupting the traffic in a place outside the protection of the offloaded checksuming.  One of the central assumptions with the likes of checksum offload in a NIC is that anything "above" the checksum offload in the NIC has some sort of data protection - at least parity, if not ECC.  This includes components in the NIC itself, the I/O bus etc etc.

If disabling checksum offload on the compute node doesn't resolve the matter, you might consider the same on the controller.
[Kaustubh] I ended up disabling checksumming, TSO, GSO and GRO on both controller and the compute so the ethtool output looks as above. Now, the problem can only be reproduced intermittently. At times, compute node still gets a corrupted image.

rick jones

(disabling checksum offload will likely also disable the offloads which depend upon it.)

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack at lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




More information about the Openstack mailing list