[Openstack-operators] Networking breaks in CentOS guests but works with Ubuntu guests

Lorin Hochstein lorin at nimbisservices.com
Mon Apr 8 17:36:03 UTC 2013


Joe:

I grabbed 5 pickets ("tcpdump -i eth0 -XX -c5") with 8021q disabled and
enabled, I posted them to this gist:

https://gist.github.com/lorin/5338623


Lorin


On Mon, Apr 8, 2013 at 12:27 PM, Joe Topjian <joe.topjian at cybera.ca> wrote:

> Hi Lorin,
>
> Can you do "tcpdump -i eth0 -XX" on the guest with both the module on and
> off and post the output to the list or to me off-list?
>
> Thanks,
> Joe
>
>
>
> On Fri, Apr 5, 2013 at 11:23 AM, Lorin Hochstein <lorin at nimbisservices.com
> > wrote:
>
>> So, I made some progress here. I'm running FlatDHCP, so I don't have any
>> VLAN stuff configured on my nodes. I don't have the 8021q kernel module
>> loaded in any of my hosts or guests.
>>
>> If I load the 8021q kernel module in the CentOS guest, and I manually put
>> an IP address on the guest's eth0, then I can reach the guest from other
>> the cloud controller. Unfortunately, it still doesn't pick up an address
>> via DHCP: the DHCP replies still don't make it from host:vnet0 to
>>  guest:eth0, even though other packets are able to.
>>
>> Since the problem only occurs when the packet has to travel across the
>> network (if I put an IP on the bridge of the compute host, I can reach the
>> guest), it seems like the Cisco Nexus 3000 switch is putting VLAN tags in
>> the ethernet frame, and it's confusing the guest. But I can't figure out
>> why that would be. I've been trying to inspect the packets with tcpdump to
>> see if the vlan tags are there.
>>
>> I may end up just switching to VlanManager to make everything VLAN-y.
>>
>> Here's what my interfaces look like on the switch
>>
>> n3k-2# show interface switchport
>> Name: Ethernet1/1
>>   Switchport: Enabled
>>   Switchport Monitor: Not enabled
>>   Operational Mode: access
>>   Access Mode VLAN: 1 (default)
>>   Trunking Native Mode VLAN: 1 (default)
>>   Trunking VLANs Enabled: 1
>>   Administrative private-vlan primary host-association: none
>>   Administrative private-vlan secondary host-association: none
>>   Administrative private-vlan primary mapping: none
>>   Administrative private-vlan secondary mapping: none
>>   Administrative private-vlan trunk native VLAN: none
>>   Administrative private-vlan trunk encapsulation: dot1q
>>   Administrative private-vlan trunk normal VLANs: none
>>   Administrative private-vlan trunk private VLANs: none
>>   Operational private-vlan: none
>>   Unknown unicast blocked: disabled
>>    Unknown multicast blocked: disabled
>>
>> I don't have dot1q native tag enabled:
>>
>> n3k-2(config)# show vlan dot1q tag native
>> vlan dot1q native tag is disabled
>>
>>
>> Lorin
>>
>>
>> On Thu, Apr 4, 2013 at 4:27 PM, Narayan Desai <narayan.desai at gmail.com>wrote:
>>
>>> You might be hitting iptables/ebtables rules.
>>>
>>> I don't understand why this would be image specific though.
>>>
>>> Can you try generating traffic from the vm and see which counters
>>> increment? (with a static ip maybe?)
>>>  -nld
>>>
>>> On Thu, Apr 4, 2013 at 2:55 PM, Lorin Hochstein
>>> <lorin at nimbisservices.com> wrote:
>>> > Yeah, I've only loaded vhost_net on the compute host.
>>> >
>>> > I'm running CentOS 6.3 on my latest test, but I've tried with CentOS
>>> 6.4 as
>>> > well.
>>> >
>>> > I made some progress today (at least a potential workaround), but
>>> permit me
>>> > to ramble for a bit. I'm trying to run non-multihost. The eth1 on my
>>> compute
>>> > nodes are bridged to br100, and there's no IP address on br100 or eth1.
>>> >
>>> > Packets aren't getting into the VM from outside. If I manually put an
>>> IP
>>> > address on there and do an "arping" from the network node, the arp
>>> request
>>> > packets appear on vnet1 of the compute host but not on eth0 of the
>>> guest.
>>> > (Packets do leave, however, so I can do an arping from inside the
>>> guest and
>>> > the nova-network host will see the request. Similar to DHCP. It's like
>>> a
>>> > reverse black hole, things can only go out).
>>> >
>>> > However, if I put an IP address of br100 of the compute host, then the
>>> guest
>>> > can reach the host on that address.
>>> >
>>> > So, it looks like I'm going to have to switch to running multi-host to
>>> > resolve this issue, since the VM can communicate directly with a
>>> bridge on
>>> > the compute host if it has an IP.
>>> >
>>> > Still, it's puzzling to me, and I don't have a sense about how to
>>> debug this
>>> > further. How do I dig in if the problem is that packets can go from
>>> > guest:eth0 to host:vnet1, but they don't go from host:vnet1 to
>>> guest:eth0
>>> > (when they originate from a different server and travel over layer 2),
>>> and
>>> > only with a specific image that works for other people?
>>> >
>>> > Lorin
>>> >
>>> >
>>> >
>>> > On Thu, Apr 4, 2013 at 11:33 AM, Narayan Desai <
>>> narayan.desai at gmail.com>
>>> > wrote:
>>> >>
>>> >> iirc, vhost_net is only needed on the host.
>>> >>
>>> >> We have seen stability issues with 12.04 (only on particular host
>>> >> types) when using virtio without vhost_net. Enabling vhost_net on the
>>> >> host resolved the issues for us.
>>> >>
>>> >> Which version of Centos are you running?
>>> >>  -nld
>>> >>
>>> >> On Wed, Apr 3, 2013 at 3:59 PM, Lorin Hochstein
>>> >> <lorin at nimbisservices.com> wrote:
>>> >> > That was my instinct, but I've tried it both ways (toggling
>>> >> > libvirt_use_virtio_for_bridge, restarting nova-compute, launching
>>> new
>>> >> > instance), and vnc'd into the instance to confirmed that in one
>>> case the
>>> >> > virtio_net drivers were loaded, and in another case, they weren't,
>>> and
>>> >> > the
>>> >> > result was the same. But it doesn't seem to be related. It's really
>>> >> > baffling.
>>> >> >
>>> >> > Lorin
>>> >> >
>>> >> >
>>> >> > On Wed, Apr 3, 2013 at 4:47 PM, Joe Topjian <joe.topjian at cybera.ca>
>>> >> > wrote:
>>> >> >>
>>> >> >> That's really bizarre -- especially since it's only CentOS images.
>>> Do
>>> >> >> you
>>> >> >> think it might be something with virtio compatibility?
>>> >> >>
>>> >> >> I'm hesitant to lean on it being a compute/controller issue since
>>> other
>>> >> >> images work.
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Apr 3, 2013 at 2:41 PM, Lorin Hochstein
>>> >> >> <lorin at nimbisservices.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> I've tested with multiple ones, including the CentOS6 image from
>>> that
>>> >> >>> page, as well as several we have rolled on our own.
>>> >> >>>
>>> >> >>> Right now I'm testing by manually putting on the IP by doing:
>>> >> >>>
>>> >> >>> ip addr add 10.40.0.4/16 broadcast 10.40.255.255 dev eth0
>>> >> >>>
>>> >> >>> I can't ping out at all. If I try to arping out, and then tcpdump,
>>> >> >>> just
>>> >> >>> like in the DHCP case, I can see the ARP request and replies on
>>> vnet0
>>> >> >>> of the
>>> >> >>> host:
>>> >> >>>
>>> >> >>> root at c220-2:~# tcpdump -i vnet0 arp
>>> >> >>> tcpdump: WARNING: vnet0: no IPv4 address assigned
>>> >> >>> tcpdump: verbose output suppressed, use -v or -vv for full
>>> protocol
>>> >> >>> decode
>>> >> >>> 16:34:42.109067 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>> >> >>> 10.40.0.4, length 28
>>> >> >>> 16:34:42.109085 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>> >> >>> 10.40.0.4, length 28
>>> >> >>> 16:34:42.109216 ARP, Reply 10.40.0.1 is-at 54:78:1a:86:50:c9 (oui
>>> >> >>> Unknown), length 46
>>> >> >>>
>>> >> >>>
>>> >> >>> But if I tcpdump on eth0 in the guest, I only see the arp
>>> requests,
>>> >> >>> not
>>> >> >>> the replies..
>>> >> >>>
>>> >> >>>
>>> >> >>> Lorin
>>> >> >>>
>>> >> >>>
>>> >> >>> On Wed, Apr 3, 2013 at 4:26 PM, Joe Topjian <
>>> joe.topjian at cybera.ca>
>>> >> >>> wrote:
>>> >> >>>>
>>> >> >>>> What CentOS images are you using? These have worked for me:
>>> >> >>>>
>>> >> >>>> https://github.com/rackerjoe/oz-image-build
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> On Wed, Apr 3, 2013 at 2:13 PM, Lorin Hochstein
>>> >> >>>> <lorin at nimbisservices.com> wrote:
>>> >> >>>>>
>>> >> >>>>> Hi Joe:
>>> >> >>>>>
>>> >> >>>>> It happens immediately thereafter. CentOS images have never
>>> worked
>>> >> >>>>> on
>>> >> >>>>> our setup.
>>> >> >>>>>
>>> >> >>>>> Lorin
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> On Wed, Apr 3, 2013 at 3:30 PM, Joe Topjian <
>>> joe.topjian at cybera.ca>
>>> >> >>>>> wrote:
>>> >> >>>>>>
>>> >> >>>>>> Hi Lorin,
>>> >> >>>>>>
>>> >> >>>>>> Does this happen shortly after the guests were created? Or
>>> usually
>>> >> >>>>>> a
>>> >> >>>>>> few hours/days later? If the latter, are these guests seeing
>>> large
>>> >> >>>>>> amounts
>>> >> >>>>>> of bandwidth?
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>> Joe
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Lorin Hochstein
>>> >> >>>>>> <lorin at nimbisservices.com> wrote:
>>> >> >>>>>>>
>>> >> >>>>>>> Hi all:
>>> >> >>>>>>>
>>> >> >>>>>>> I'm having a strange issue where networking on my CentOS
>>> guests
>>> >> >>>>>>> isn't
>>> >> >>>>>>> working properly, but things are working fine with my Ubuntu
>>> >> >>>>>>> guests.
>>> >> >>>>>>>
>>> >> >>>>>>> I'm running Folsom on Ubuntu 12.04, nova-network, not
>>> multi-host.
>>> >> >>>>>>>
>>> >> >>>>>>> The first symptom is that CentOS instances don't get IP
>>> addresses
>>> >> >>>>>>> via
>>> >> >>>>>>> DHCP. If I trace the DHCP requests and replies using tcpdump,
>>> I
>>> >> >>>>>>> can see the
>>> >> >>>>>>> reply from dnsmasq reach the vnetX interface of the compute
>>> host,
>>> >> >>>>>>> but it
>>> >> >>>>>>> doesn't get to the eth0 interface of the compute host. (I'm
>>> at a
>>> >> >>>>>>> loss here
>>> >> >>>>>>> about how to debug something like that).
>>> >> >>>>>>>
>>> >> >>>>>>> If I try to statically configure an IP address on the guest
>>> >> >>>>>>> instead,
>>> >> >>>>>>> networking still doesn't work. I can't ping anything on the
>>> >> >>>>>>> subnet, and I
>>> >> >>>>>>> don't even see the icmp traffic on vnetX of the host.
>>> >> >>>>>>>
>>> >> >>>>>>> I've tried this twiddling the following options, but no
>>> change in
>>> >> >>>>>>> behavior:
>>> >> >>>>>>>
>>> >> >>>>>>> * Adding the following rule to nova-network node: iptables -A
>>> >> >>>>>>> POSTROUTING -t mangle -p udp --dport bootpc -j CHECKSUM
>>> >> >>>>>>> --checksum-fill
>>> >> >>>>>>> * Adding the same rule to nova-compute node
>>> >> >>>>>>> * Setting libvirt_use_virtio_for_bridge to "yes" and "no"
>>> >> >>>>>>> (restarting
>>> >> >>>>>>> nova-compute, re-launching instances)
>>> >> >>>>>>> * With and without vhost_net loaded in nova-compute
>>> (restarting
>>> >> >>>>>>> nova-compute, re-launching instances)
>>> >> >>>>>>> * Disabling iIpv6 inside of the CentOS guest
>>> >> >>>>>>>
>>> >> >>>>>>> Has anybody encountered this before?
>>> >> >>>>>>>
>>> >> >>>>>>> Lorin
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> Lorin Hochstein
>>> >> >>>>>>> Lead Architect - Cloud Services
>>> >> >>>>>>> Nimbis Services, Inc.
>>> >> >>>>>>> www.nimbisservices.com
>>> >> >>>>>>>
>>> >> >>>>>>> _______________________________________________
>>> >> >>>>>>> OpenStack-operators mailing list
>>> >> >>>>>>> OpenStack-operators at lists.openstack.org
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> >> >>>>>>>
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>>
>>> >> >>>>>> --
>>> >> >>>>>> Joe Topjian
>>> >> >>>>>> Systems Administrator
>>> >> >>>>>> Cybera Inc.
>>> >> >>>>>>
>>> >> >>>>>> www.cybera.ca
>>> >> >>>>>>
>>> >> >>>>>> Cybera is a not-for-profit organization that works to spur and
>>> >> >>>>>> support
>>> >> >>>>>> innovation, for the economic benefit of Alberta, through the
>>> use of
>>> >> >>>>>> cyberinfrastructure.
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> Lorin Hochstein
>>> >> >>>>> Lead Architect - Cloud Services
>>> >> >>>>> Nimbis Services, Inc.
>>> >> >>>>> www.nimbisservices.com
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> --
>>> >> >>>> Joe Topjian
>>> >> >>>> Systems Administrator
>>> >> >>>> Cybera Inc.
>>> >> >>>>
>>> >> >>>> www.cybera.ca
>>> >> >>>>
>>> >> >>>> Cybera is a not-for-profit organization that works to spur and
>>> >> >>>> support
>>> >> >>>> innovation, for the economic benefit of Alberta, through the use
>>> of
>>> >> >>>> cyberinfrastructure.
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> --
>>> >> >>> Lorin Hochstein
>>> >> >>> Lead Architect - Cloud Services
>>> >> >>> Nimbis Services, Inc.
>>> >> >>> www.nimbisservices.com
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Joe Topjian
>>> >> >> Systems Administrator
>>> >> >> Cybera Inc.
>>> >> >>
>>> >> >> www.cybera.ca
>>> >> >>
>>> >> >> Cybera is a not-for-profit organization that works to spur and
>>> support
>>> >> >> innovation, for the economic benefit of Alberta, through the use of
>>> >> >> cyberinfrastructure.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Lorin Hochstein
>>> >> > Lead Architect - Cloud Services
>>> >> > Nimbis Services, Inc.
>>> >> > www.nimbisservices.com
>>> >> >
>>> >> > _______________________________________________
>>> >> > OpenStack-operators mailing list
>>> >> > OpenStack-operators at lists.openstack.org
>>> >> >
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> >> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Lorin Hochstein
>>> > Lead Architect - Cloud Services
>>> > Nimbis Services, Inc.
>>> > www.nimbisservices.com
>>>
>>
>>
>>
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>
>
>
> --
> Joe Topjian
> Systems Administrator
> Cybera Inc.
>
> www.cybera.ca
>
> Cybera is a not-for-profit organization that works to spur and support
> innovation, for the economic benefit of Alberta, through the use
> of cyberinfrastructure.
>



-- 
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130408/d83bbc2f/attachment.html>


More information about the OpenStack-operators mailing list