[Openstack-operators] Networking breaks in CentOS guests but works with Ubuntu guests

Lorin Hochstein lorin at nimbisservices.com
Mon Apr 8 19:25:39 UTC 2013


In doing the tcpdump's, I discovered that some packets (including arp
requests) are tagged as vlan 0 inside of my CentOS guest:


14:29:25.906732 54:78:1a:86:50:c9 (oui Unknown) > Broadcast, ethertype
802.1Q (0x8100), length 64: vlan 0, p 0, ethertype ARP, Ethernet (len
6), IPv4 (len 4), Request who-has 10.40.0.5 (Broadcast) tell
10.40.0.1, length 46
	0x0000:  ffff ffff ffff 5478 1a86 50c9 8100 0000  ......Tx..P.....
	0x0010:  0806 0001 0800 0604 0001 5478 1a86 50c9  ..........Tx..P.
	0x0020:  0a28 0001 ffff ffff ffff 0a28 0005 0000  .(.........(....
	0x0030:  0000 0000 0000 0000 0000 0000 dac7 07ed  ................


Does anybody know why that would happen?

When I do a tcpdump on vnet1 on the host, they do not have a VLAN tag on
them at all:


5:19:37.605051 54:78:1a:86:50:c9 (oui Unknown) > Broadcast, ethertype ARP
(0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has
10.40.0.5 (Broadcast) tell 10.40.0.1, length 46
0x0000:  ffff ffff ffff 5478 1a86 50c9 0806 0001  ......Tx..P.....
0x0010:  0800 0604 0001 5478 1a86 50c9 0a28 0001  ......Tx..P..(..
0x0020:  ffff ffff ffff 0a28 0005 0000 0000 0000  .......(........
0x0030:  0000 0000 0000 0000 dac7 07ed            ............


I also discovered a lot of packets that seem to be because the Cisco NICs
we have have a FCoE mode, which I have attempted to disable. Oddly, those
packets were also tagged vlan 0 when I viewed them inside of the CentOS
guest, but not when I viewed them inside of the Ubuntu host. I can't figure
out whether the VLAN 0 tags are on the frames coming off of the switch and
Ubuntu isn't showing them to me, or they somehow get to the frames when
they go into the CentOS guest.

Lorin

On Mon, Apr 8, 2013 at 2:18 PM, Lorin Hochstein <lorin at nimbisservices.com>wrote:

> Joe:
>
> I updated the gist, in all its ethernet-frame-detailed glory.
>
>
> On Mon, Apr 8, 2013 at 2:02 PM, Joe Topjian <joe.topjian at cybera.ca> wrote:
>
>> Thanks, Lorin. I hate to do this, but can you re-run with "-XX -vv -e" ?
>>
>>
>> On Mon, Apr 8, 2013 at 11:36 AM, Lorin Hochstein <
>> lorin at nimbisservices.com> wrote:
>>
>>> Joe:
>>>
>>> I grabbed 5 pickets ("tcpdump -i eth0 -XX -c5") with 8021q disabled and
>>> enabled, I posted them to this gist:
>>>
>>> https://gist.github.com/lorin/5338623
>>>
>>>
>>> Lorin
>>>
>>>
>>> On Mon, Apr 8, 2013 at 12:27 PM, Joe Topjian <joe.topjian at cybera.ca>wrote:
>>>
>>>> Hi Lorin,
>>>>
>>>> Can you do "tcpdump -i eth0 -XX" on the guest with both the module on
>>>> and off and post the output to the list or to me off-list?
>>>>
>>>> Thanks,
>>>> Joe
>>>>
>>>>
>>>>
>>>> On Fri, Apr 5, 2013 at 11:23 AM, Lorin Hochstein <
>>>> lorin at nimbisservices.com> wrote:
>>>>
>>>>> So, I made some progress here. I'm running FlatDHCP, so I don't have
>>>>> any VLAN stuff configured on my nodes. I don't have the 8021q kernel module
>>>>> loaded in any of my hosts or guests.
>>>>>
>>>>> If I load the 8021q kernel module in the CentOS guest, and I manually
>>>>> put an IP address on the guest's eth0, then I can reach the guest from
>>>>> other the cloud controller. Unfortunately, it still doesn't pick up an
>>>>> address via DHCP: the DHCP replies still don't make it from host:vnet0 to
>>>>>  guest:eth0, even though other packets are able to.
>>>>>
>>>>> Since the problem only occurs when the packet has to travel across the
>>>>> network (if I put an IP on the bridge of the compute host, I can reach the
>>>>> guest), it seems like the Cisco Nexus 3000 switch is putting VLAN tags in
>>>>> the ethernet frame, and it's confusing the guest. But I can't figure out
>>>>> why that would be. I've been trying to inspect the packets with tcpdump to
>>>>> see if the vlan tags are there.
>>>>>
>>>>> I may end up just switching to VlanManager to make everything VLAN-y.
>>>>>
>>>>> Here's what my interfaces look like on the switch
>>>>>
>>>>> n3k-2# show interface switchport
>>>>> Name: Ethernet1/1
>>>>>   Switchport: Enabled
>>>>>   Switchport Monitor: Not enabled
>>>>>   Operational Mode: access
>>>>>   Access Mode VLAN: 1 (default)
>>>>>   Trunking Native Mode VLAN: 1 (default)
>>>>>   Trunking VLANs Enabled: 1
>>>>>   Administrative private-vlan primary host-association: none
>>>>>   Administrative private-vlan secondary host-association: none
>>>>>   Administrative private-vlan primary mapping: none
>>>>>   Administrative private-vlan secondary mapping: none
>>>>>   Administrative private-vlan trunk native VLAN: none
>>>>>   Administrative private-vlan trunk encapsulation: dot1q
>>>>>   Administrative private-vlan trunk normal VLANs: none
>>>>>   Administrative private-vlan trunk private VLANs: none
>>>>>   Operational private-vlan: none
>>>>>   Unknown unicast blocked: disabled
>>>>>    Unknown multicast blocked: disabled
>>>>>
>>>>> I don't have dot1q native tag enabled:
>>>>>
>>>>> n3k-2(config)# show vlan dot1q tag native
>>>>> vlan dot1q native tag is disabled
>>>>>
>>>>>
>>>>> Lorin
>>>>>
>>>>>
>>>>> On Thu, Apr 4, 2013 at 4:27 PM, Narayan Desai <narayan.desai at gmail.com
>>>>> > wrote:
>>>>>
>>>>>> You might be hitting iptables/ebtables rules.
>>>>>>
>>>>>> I don't understand why this would be image specific though.
>>>>>>
>>>>>> Can you try generating traffic from the vm and see which counters
>>>>>> increment? (with a static ip maybe?)
>>>>>>  -nld
>>>>>>
>>>>>> On Thu, Apr 4, 2013 at 2:55 PM, Lorin Hochstein
>>>>>> <lorin at nimbisservices.com> wrote:
>>>>>> > Yeah, I've only loaded vhost_net on the compute host.
>>>>>> >
>>>>>> > I'm running CentOS 6.3 on my latest test, but I've tried with
>>>>>> CentOS 6.4 as
>>>>>> > well.
>>>>>> >
>>>>>> > I made some progress today (at least a potential workaround), but
>>>>>> permit me
>>>>>> > to ramble for a bit. I'm trying to run non-multihost. The eth1 on
>>>>>> my compute
>>>>>> > nodes are bridged to br100, and there's no IP address on br100 or
>>>>>> eth1.
>>>>>> >
>>>>>> > Packets aren't getting into the VM from outside. If I manually put
>>>>>> an IP
>>>>>> > address on there and do an "arping" from the network node, the arp
>>>>>> request
>>>>>> > packets appear on vnet1 of the compute host but not on eth0 of the
>>>>>> guest.
>>>>>> > (Packets do leave, however, so I can do an arping from inside the
>>>>>> guest and
>>>>>> > the nova-network host will see the request. Similar to DHCP. It's
>>>>>> like a
>>>>>> > reverse black hole, things can only go out).
>>>>>> >
>>>>>> > However, if I put an IP address of br100 of the compute host, then
>>>>>> the guest
>>>>>> > can reach the host on that address.
>>>>>> >
>>>>>> > So, it looks like I'm going to have to switch to running multi-host
>>>>>> to
>>>>>> > resolve this issue, since the VM can communicate directly with a
>>>>>> bridge on
>>>>>> > the compute host if it has an IP.
>>>>>> >
>>>>>> > Still, it's puzzling to me, and I don't have a sense about how to
>>>>>> debug this
>>>>>> > further. How do I dig in if the problem is that packets can go from
>>>>>> > guest:eth0 to host:vnet1, but they don't go from host:vnet1 to
>>>>>> guest:eth0
>>>>>> > (when they originate from a different server and travel over layer
>>>>>> 2), and
>>>>>> > only with a specific image that works for other people?
>>>>>> >
>>>>>> > Lorin
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Apr 4, 2013 at 11:33 AM, Narayan Desai <
>>>>>> narayan.desai at gmail.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> iirc, vhost_net is only needed on the host.
>>>>>> >>
>>>>>> >> We have seen stability issues with 12.04 (only on particular host
>>>>>> >> types) when using virtio without vhost_net. Enabling vhost_net on
>>>>>> the
>>>>>> >> host resolved the issues for us.
>>>>>> >>
>>>>>> >> Which version of Centos are you running?
>>>>>> >>  -nld
>>>>>> >>
>>>>>> >> On Wed, Apr 3, 2013 at 3:59 PM, Lorin Hochstein
>>>>>> >> <lorin at nimbisservices.com> wrote:
>>>>>> >> > That was my instinct, but I've tried it both ways (toggling
>>>>>> >> > libvirt_use_virtio_for_bridge, restarting nova-compute,
>>>>>> launching new
>>>>>> >> > instance), and vnc'd into the instance to confirmed that in one
>>>>>> case the
>>>>>> >> > virtio_net drivers were loaded, and in another case, they
>>>>>> weren't, and
>>>>>> >> > the
>>>>>> >> > result was the same. But it doesn't seem to be related. It's
>>>>>> really
>>>>>> >> > baffling.
>>>>>> >> >
>>>>>> >> > Lorin
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Wed, Apr 3, 2013 at 4:47 PM, Joe Topjian <
>>>>>> joe.topjian at cybera.ca>
>>>>>> >> > wrote:
>>>>>> >> >>
>>>>>> >> >> That's really bizarre -- especially since it's only CentOS
>>>>>> images. Do
>>>>>> >> >> you
>>>>>> >> >> think it might be something with virtio compatibility?
>>>>>> >> >>
>>>>>> >> >> I'm hesitant to lean on it being a compute/controller issue
>>>>>> since other
>>>>>> >> >> images work.
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> On Wed, Apr 3, 2013 at 2:41 PM, Lorin Hochstein
>>>>>> >> >> <lorin at nimbisservices.com>
>>>>>> >> >> wrote:
>>>>>> >> >>>
>>>>>> >> >>> I've tested with multiple ones, including the CentOS6 image
>>>>>> from that
>>>>>> >> >>> page, as well as several we have rolled on our own.
>>>>>> >> >>>
>>>>>> >> >>> Right now I'm testing by manually putting on the IP by doing:
>>>>>> >> >>>
>>>>>> >> >>> ip addr add 10.40.0.4/16 broadcast 10.40.255.255 dev eth0
>>>>>> >> >>>
>>>>>> >> >>> I can't ping out at all. If I try to arping out, and then
>>>>>> tcpdump,
>>>>>> >> >>> just
>>>>>> >> >>> like in the DHCP case, I can see the ARP request and replies
>>>>>> on vnet0
>>>>>> >> >>> of the
>>>>>> >> >>> host:
>>>>>> >> >>>
>>>>>> >> >>> root at c220-2:~# tcpdump -i vnet0 arp
>>>>>> >> >>> tcpdump: WARNING: vnet0: no IPv4 address assigned
>>>>>> >> >>> tcpdump: verbose output suppressed, use -v or -vv for full
>>>>>> protocol
>>>>>> >> >>> decode
>>>>>> >> >>> 16:34:42.109067 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>>>>> >> >>> 10.40.0.4, length 28
>>>>>> >> >>> 16:34:42.109085 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>>>>> >> >>> 10.40.0.4, length 28
>>>>>> >> >>> 16:34:42.109216 ARP, Reply 10.40.0.1 is-at 54:78:1a:86:50:c9
>>>>>> (oui
>>>>>> >> >>> Unknown), length 46
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> But if I tcpdump on eth0 in the guest, I only see the arp
>>>>>> requests,
>>>>>> >> >>> not
>>>>>> >> >>> the replies..
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> Lorin
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On Wed, Apr 3, 2013 at 4:26 PM, Joe Topjian <
>>>>>> joe.topjian at cybera.ca>
>>>>>> >> >>> wrote:
>>>>>> >> >>>>
>>>>>> >> >>>> What CentOS images are you using? These have worked for me:
>>>>>> >> >>>>
>>>>>> >> >>>> https://github.com/rackerjoe/oz-image-build
>>>>>> >> >>>>
>>>>>> >> >>>>
>>>>>> >> >>>> On Wed, Apr 3, 2013 at 2:13 PM, Lorin Hochstein
>>>>>> >> >>>> <lorin at nimbisservices.com> wrote:
>>>>>> >> >>>>>
>>>>>> >> >>>>> Hi Joe:
>>>>>> >> >>>>>
>>>>>> >> >>>>> It happens immediately thereafter. CentOS images have never
>>>>>> worked
>>>>>> >> >>>>> on
>>>>>> >> >>>>> our setup.
>>>>>> >> >>>>>
>>>>>> >> >>>>> Lorin
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> On Wed, Apr 3, 2013 at 3:30 PM, Joe Topjian <
>>>>>> joe.topjian at cybera.ca>
>>>>>> >> >>>>> wrote:
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Hi Lorin,
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Does this happen shortly after the guests were created? Or
>>>>>> usually
>>>>>> >> >>>>>> a
>>>>>> >> >>>>>> few hours/days later? If the latter, are these guests
>>>>>> seeing large
>>>>>> >> >>>>>> amounts
>>>>>> >> >>>>>> of bandwidth?
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Thanks,
>>>>>> >> >>>>>> Joe
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Lorin Hochstein
>>>>>> >> >>>>>> <lorin at nimbisservices.com> wrote:
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> Hi all:
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> I'm having a strange issue where networking on my CentOS
>>>>>> guests
>>>>>> >> >>>>>>> isn't
>>>>>> >> >>>>>>> working properly, but things are working fine with my
>>>>>> Ubuntu
>>>>>> >> >>>>>>> guests.
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> I'm running Folsom on Ubuntu 12.04, nova-network, not
>>>>>> multi-host.
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> The first symptom is that CentOS instances don't get IP
>>>>>> addresses
>>>>>> >> >>>>>>> via
>>>>>> >> >>>>>>> DHCP. If I trace the DHCP requests and replies using
>>>>>> tcpdump, I
>>>>>> >> >>>>>>> can see the
>>>>>> >> >>>>>>> reply from dnsmasq reach the vnetX interface of the
>>>>>> compute host,
>>>>>> >> >>>>>>> but it
>>>>>> >> >>>>>>> doesn't get to the eth0 interface of the compute host.
>>>>>> (I'm at a
>>>>>> >> >>>>>>> loss here
>>>>>> >> >>>>>>> about how to debug something like that).
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> If I try to statically configure an IP address on the guest
>>>>>> >> >>>>>>> instead,
>>>>>> >> >>>>>>> networking still doesn't work. I can't ping anything on the
>>>>>> >> >>>>>>> subnet, and I
>>>>>> >> >>>>>>> don't even see the icmp traffic on vnetX of the host.
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> I've tried this twiddling the following options, but no
>>>>>> change in
>>>>>> >> >>>>>>> behavior:
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> * Adding the following rule to nova-network node: iptables
>>>>>> -A
>>>>>> >> >>>>>>> POSTROUTING -t mangle -p udp --dport bootpc -j CHECKSUM
>>>>>> >> >>>>>>> --checksum-fill
>>>>>> >> >>>>>>> * Adding the same rule to nova-compute node
>>>>>> >> >>>>>>> * Setting libvirt_use_virtio_for_bridge to "yes" and "no"
>>>>>> >> >>>>>>> (restarting
>>>>>> >> >>>>>>> nova-compute, re-launching instances)
>>>>>> >> >>>>>>> * With and without vhost_net loaded in nova-compute
>>>>>> (restarting
>>>>>> >> >>>>>>> nova-compute, re-launching instances)
>>>>>> >> >>>>>>> * Disabling iIpv6 inside of the CentOS guest
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> Has anybody encountered this before?
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> Lorin
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> --
>>>>>> >> >>>>>>> Lorin Hochstein
>>>>>> >> >>>>>>> Lead Architect - Cloud Services
>>>>>> >> >>>>>>> Nimbis Services, Inc.
>>>>>> >> >>>>>>> www.nimbisservices.com
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> _______________________________________________
>>>>>> >> >>>>>>> OpenStack-operators mailing list
>>>>>> >> >>>>>>> OpenStack-operators at lists.openstack.org
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>>
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> --
>>>>>> >> >>>>>> Joe Topjian
>>>>>> >> >>>>>> Systems Administrator
>>>>>> >> >>>>>> Cybera Inc.
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> www.cybera.ca
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> Cybera is a not-for-profit organization that works to spur
>>>>>> and
>>>>>> >> >>>>>> support
>>>>>> >> >>>>>> innovation, for the economic benefit of Alberta, through
>>>>>> the use of
>>>>>> >> >>>>>> cyberinfrastructure.
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>> --
>>>>>> >> >>>>> Lorin Hochstein
>>>>>> >> >>>>> Lead Architect - Cloud Services
>>>>>> >> >>>>> Nimbis Services, Inc.
>>>>>> >> >>>>> www.nimbisservices.com
>>>>>> >> >>>>
>>>>>> >> >>>>
>>>>>> >> >>>>
>>>>>> >> >>>>
>>>>>> >> >>>> --
>>>>>> >> >>>> Joe Topjian
>>>>>> >> >>>> Systems Administrator
>>>>>> >> >>>> Cybera Inc.
>>>>>> >> >>>>
>>>>>> >> >>>> www.cybera.ca
>>>>>> >> >>>>
>>>>>> >> >>>> Cybera is a not-for-profit organization that works to spur and
>>>>>> >> >>>> support
>>>>>> >> >>>> innovation, for the economic benefit of Alberta, through the
>>>>>> use of
>>>>>> >> >>>> cyberinfrastructure.
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Lorin Hochstein
>>>>>> >> >>> Lead Architect - Cloud Services
>>>>>> >> >>> Nimbis Services, Inc.
>>>>>> >> >>> www.nimbisservices.com
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Joe Topjian
>>>>>> >> >> Systems Administrator
>>>>>> >> >> Cybera Inc.
>>>>>> >> >>
>>>>>> >> >> www.cybera.ca
>>>>>> >> >>
>>>>>> >> >> Cybera is a not-for-profit organization that works to spur and
>>>>>> support
>>>>>> >> >> innovation, for the economic benefit of Alberta, through the
>>>>>> use of
>>>>>> >> >> cyberinfrastructure.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > --
>>>>>> >> > Lorin Hochstein
>>>>>> >> > Lead Architect - Cloud Services
>>>>>> >> > Nimbis Services, Inc.
>>>>>> >> > www.nimbisservices.com
>>>>>> >> >
>>>>>> >> > _______________________________________________
>>>>>> >> > OpenStack-operators mailing list
>>>>>> >> > OpenStack-operators at lists.openstack.org
>>>>>> >> >
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Lorin Hochstein
>>>>>> > Lead Architect - Cloud Services
>>>>>> > Nimbis Services, Inc.
>>>>>> > www.nimbisservices.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lorin Hochstein
>>>>> Lead Architect - Cloud Services
>>>>> Nimbis Services, Inc.
>>>>> www.nimbisservices.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joe Topjian
>>>> Systems Administrator
>>>> Cybera Inc.
>>>>
>>>> www.cybera.ca
>>>>
>>>> Cybera is a not-for-profit organization that works to spur and support
>>>> innovation, for the economic benefit of Alberta, through the use
>>>> of cyberinfrastructure.
>>>>
>>>
>>>
>>>
>>> --
>>> Lorin Hochstein
>>> Lead Architect - Cloud Services
>>> Nimbis Services, Inc.
>>> www.nimbisservices.com
>>>
>>
>>
>>
>> --
>> Joe Topjian
>> Systems Administrator
>> Cybera Inc.
>>
>> www.cybera.ca
>>
>> Cybera is a not-for-profit organization that works to spur and support
>> innovation, for the economic benefit of Alberta, through the use
>> of cyberinfrastructure.
>>
>
>
>
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com
>



-- 
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130408/85c0e2a5/attachment.html>


More information about the OpenStack-operators mailing list