[Openstack-operators] Networking breaks in CentOS guests but works with Ubuntu guests

Joe Topjian joe.topjian at cybera.ca
Mon Apr 8 18:02:15 UTC 2013


Thanks, Lorin. I hate to do this, but can you re-run with "-XX -vv -e" ?


On Mon, Apr 8, 2013 at 11:36 AM, Lorin Hochstein
<lorin at nimbisservices.com>wrote:

> Joe:
>
> I grabbed 5 pickets ("tcpdump -i eth0 -XX -c5") with 8021q disabled and
> enabled, I posted them to this gist:
>
> https://gist.github.com/lorin/5338623
>
>
> Lorin
>
>
> On Mon, Apr 8, 2013 at 12:27 PM, Joe Topjian <joe.topjian at cybera.ca>wrote:
>
>> Hi Lorin,
>>
>> Can you do "tcpdump -i eth0 -XX" on the guest with both the module on and
>> off and post the output to the list or to me off-list?
>>
>> Thanks,
>> Joe
>>
>>
>>
>> On Fri, Apr 5, 2013 at 11:23 AM, Lorin Hochstein <
>> lorin at nimbisservices.com> wrote:
>>
>>> So, I made some progress here. I'm running FlatDHCP, so I don't have any
>>> VLAN stuff configured on my nodes. I don't have the 8021q kernel module
>>> loaded in any of my hosts or guests.
>>>
>>> If I load the 8021q kernel module in the CentOS guest, and I manually
>>> put an IP address on the guest's eth0, then I can reach the guest from
>>> other the cloud controller. Unfortunately, it still doesn't pick up an
>>> address via DHCP: the DHCP replies still don't make it from host:vnet0 to
>>>  guest:eth0, even though other packets are able to.
>>>
>>> Since the problem only occurs when the packet has to travel across the
>>> network (if I put an IP on the bridge of the compute host, I can reach the
>>> guest), it seems like the Cisco Nexus 3000 switch is putting VLAN tags in
>>> the ethernet frame, and it's confusing the guest. But I can't figure out
>>> why that would be. I've been trying to inspect the packets with tcpdump to
>>> see if the vlan tags are there.
>>>
>>> I may end up just switching to VlanManager to make everything VLAN-y.
>>>
>>> Here's what my interfaces look like on the switch
>>>
>>> n3k-2# show interface switchport
>>> Name: Ethernet1/1
>>>   Switchport: Enabled
>>>   Switchport Monitor: Not enabled
>>>   Operational Mode: access
>>>   Access Mode VLAN: 1 (default)
>>>   Trunking Native Mode VLAN: 1 (default)
>>>   Trunking VLANs Enabled: 1
>>>   Administrative private-vlan primary host-association: none
>>>   Administrative private-vlan secondary host-association: none
>>>   Administrative private-vlan primary mapping: none
>>>   Administrative private-vlan secondary mapping: none
>>>   Administrative private-vlan trunk native VLAN: none
>>>   Administrative private-vlan trunk encapsulation: dot1q
>>>   Administrative private-vlan trunk normal VLANs: none
>>>   Administrative private-vlan trunk private VLANs: none
>>>   Operational private-vlan: none
>>>   Unknown unicast blocked: disabled
>>>    Unknown multicast blocked: disabled
>>>
>>> I don't have dot1q native tag enabled:
>>>
>>> n3k-2(config)# show vlan dot1q tag native
>>> vlan dot1q native tag is disabled
>>>
>>>
>>> Lorin
>>>
>>>
>>> On Thu, Apr 4, 2013 at 4:27 PM, Narayan Desai <narayan.desai at gmail.com>wrote:
>>>
>>>> You might be hitting iptables/ebtables rules.
>>>>
>>>> I don't understand why this would be image specific though.
>>>>
>>>> Can you try generating traffic from the vm and see which counters
>>>> increment? (with a static ip maybe?)
>>>>  -nld
>>>>
>>>> On Thu, Apr 4, 2013 at 2:55 PM, Lorin Hochstein
>>>> <lorin at nimbisservices.com> wrote:
>>>> > Yeah, I've only loaded vhost_net on the compute host.
>>>> >
>>>> > I'm running CentOS 6.3 on my latest test, but I've tried with CentOS
>>>> 6.4 as
>>>> > well.
>>>> >
>>>> > I made some progress today (at least a potential workaround), but
>>>> permit me
>>>> > to ramble for a bit. I'm trying to run non-multihost. The eth1 on my
>>>> compute
>>>> > nodes are bridged to br100, and there's no IP address on br100 or
>>>> eth1.
>>>> >
>>>> > Packets aren't getting into the VM from outside. If I manually put an
>>>> IP
>>>> > address on there and do an "arping" from the network node, the arp
>>>> request
>>>> > packets appear on vnet1 of the compute host but not on eth0 of the
>>>> guest.
>>>> > (Packets do leave, however, so I can do an arping from inside the
>>>> guest and
>>>> > the nova-network host will see the request. Similar to DHCP. It's
>>>> like a
>>>> > reverse black hole, things can only go out).
>>>> >
>>>> > However, if I put an IP address of br100 of the compute host, then
>>>> the guest
>>>> > can reach the host on that address.
>>>> >
>>>> > So, it looks like I'm going to have to switch to running multi-host to
>>>> > resolve this issue, since the VM can communicate directly with a
>>>> bridge on
>>>> > the compute host if it has an IP.
>>>> >
>>>> > Still, it's puzzling to me, and I don't have a sense about how to
>>>> debug this
>>>> > further. How do I dig in if the problem is that packets can go from
>>>> > guest:eth0 to host:vnet1, but they don't go from host:vnet1 to
>>>> guest:eth0
>>>> > (when they originate from a different server and travel over layer
>>>> 2), and
>>>> > only with a specific image that works for other people?
>>>> >
>>>> > Lorin
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Apr 4, 2013 at 11:33 AM, Narayan Desai <
>>>> narayan.desai at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> iirc, vhost_net is only needed on the host.
>>>> >>
>>>> >> We have seen stability issues with 12.04 (only on particular host
>>>> >> types) when using virtio without vhost_net. Enabling vhost_net on the
>>>> >> host resolved the issues for us.
>>>> >>
>>>> >> Which version of Centos are you running?
>>>> >>  -nld
>>>> >>
>>>> >> On Wed, Apr 3, 2013 at 3:59 PM, Lorin Hochstein
>>>> >> <lorin at nimbisservices.com> wrote:
>>>> >> > That was my instinct, but I've tried it both ways (toggling
>>>> >> > libvirt_use_virtio_for_bridge, restarting nova-compute, launching
>>>> new
>>>> >> > instance), and vnc'd into the instance to confirmed that in one
>>>> case the
>>>> >> > virtio_net drivers were loaded, and in another case, they weren't,
>>>> and
>>>> >> > the
>>>> >> > result was the same. But it doesn't seem to be related. It's really
>>>> >> > baffling.
>>>> >> >
>>>> >> > Lorin
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Apr 3, 2013 at 4:47 PM, Joe Topjian <joe.topjian at cybera.ca
>>>> >
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> That's really bizarre -- especially since it's only CentOS
>>>> images. Do
>>>> >> >> you
>>>> >> >> think it might be something with virtio compatibility?
>>>> >> >>
>>>> >> >> I'm hesitant to lean on it being a compute/controller issue since
>>>> other
>>>> >> >> images work.
>>>> >> >>
>>>> >> >>
>>>> >> >> On Wed, Apr 3, 2013 at 2:41 PM, Lorin Hochstein
>>>> >> >> <lorin at nimbisservices.com>
>>>> >> >> wrote:
>>>> >> >>>
>>>> >> >>> I've tested with multiple ones, including the CentOS6 image from
>>>> that
>>>> >> >>> page, as well as several we have rolled on our own.
>>>> >> >>>
>>>> >> >>> Right now I'm testing by manually putting on the IP by doing:
>>>> >> >>>
>>>> >> >>> ip addr add 10.40.0.4/16 broadcast 10.40.255.255 dev eth0
>>>> >> >>>
>>>> >> >>> I can't ping out at all. If I try to arping out, and then
>>>> tcpdump,
>>>> >> >>> just
>>>> >> >>> like in the DHCP case, I can see the ARP request and replies on
>>>> vnet0
>>>> >> >>> of the
>>>> >> >>> host:
>>>> >> >>>
>>>> >> >>> root at c220-2:~# tcpdump -i vnet0 arp
>>>> >> >>> tcpdump: WARNING: vnet0: no IPv4 address assigned
>>>> >> >>> tcpdump: verbose output suppressed, use -v or -vv for full
>>>> protocol
>>>> >> >>> decode
>>>> >> >>> 16:34:42.109067 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>>> >> >>> 10.40.0.4, length 28
>>>> >> >>> 16:34:42.109085 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>>> >> >>> 10.40.0.4, length 28
>>>> >> >>> 16:34:42.109216 ARP, Reply 10.40.0.1 is-at 54:78:1a:86:50:c9 (oui
>>>> >> >>> Unknown), length 46
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> But if I tcpdump on eth0 in the guest, I only see the arp
>>>> requests,
>>>> >> >>> not
>>>> >> >>> the replies..
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> Lorin
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> On Wed, Apr 3, 2013 at 4:26 PM, Joe Topjian <
>>>> joe.topjian at cybera.ca>
>>>> >> >>> wrote:
>>>> >> >>>>
>>>> >> >>>> What CentOS images are you using? These have worked for me:
>>>> >> >>>>
>>>> >> >>>> https://github.com/rackerjoe/oz-image-build
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> On Wed, Apr 3, 2013 at 2:13 PM, Lorin Hochstein
>>>> >> >>>> <lorin at nimbisservices.com> wrote:
>>>> >> >>>>>
>>>> >> >>>>> Hi Joe:
>>>> >> >>>>>
>>>> >> >>>>> It happens immediately thereafter. CentOS images have never
>>>> worked
>>>> >> >>>>> on
>>>> >> >>>>> our setup.
>>>> >> >>>>>
>>>> >> >>>>> Lorin
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> On Wed, Apr 3, 2013 at 3:30 PM, Joe Topjian <
>>>> joe.topjian at cybera.ca>
>>>> >> >>>>> wrote:
>>>> >> >>>>>>
>>>> >> >>>>>> Hi Lorin,
>>>> >> >>>>>>
>>>> >> >>>>>> Does this happen shortly after the guests were created? Or
>>>> usually
>>>> >> >>>>>> a
>>>> >> >>>>>> few hours/days later? If the latter, are these guests seeing
>>>> large
>>>> >> >>>>>> amounts
>>>> >> >>>>>> of bandwidth?
>>>> >> >>>>>>
>>>> >> >>>>>> Thanks,
>>>> >> >>>>>> Joe
>>>> >> >>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Lorin Hochstein
>>>> >> >>>>>> <lorin at nimbisservices.com> wrote:
>>>> >> >>>>>>>
>>>> >> >>>>>>> Hi all:
>>>> >> >>>>>>>
>>>> >> >>>>>>> I'm having a strange issue where networking on my CentOS
>>>> guests
>>>> >> >>>>>>> isn't
>>>> >> >>>>>>> working properly, but things are working fine with my Ubuntu
>>>> >> >>>>>>> guests.
>>>> >> >>>>>>>
>>>> >> >>>>>>> I'm running Folsom on Ubuntu 12.04, nova-network, not
>>>> multi-host.
>>>> >> >>>>>>>
>>>> >> >>>>>>> The first symptom is that CentOS instances don't get IP
>>>> addresses
>>>> >> >>>>>>> via
>>>> >> >>>>>>> DHCP. If I trace the DHCP requests and replies using
>>>> tcpdump, I
>>>> >> >>>>>>> can see the
>>>> >> >>>>>>> reply from dnsmasq reach the vnetX interface of the compute
>>>> host,
>>>> >> >>>>>>> but it
>>>> >> >>>>>>> doesn't get to the eth0 interface of the compute host. (I'm
>>>> at a
>>>> >> >>>>>>> loss here
>>>> >> >>>>>>> about how to debug something like that).
>>>> >> >>>>>>>
>>>> >> >>>>>>> If I try to statically configure an IP address on the guest
>>>> >> >>>>>>> instead,
>>>> >> >>>>>>> networking still doesn't work. I can't ping anything on the
>>>> >> >>>>>>> subnet, and I
>>>> >> >>>>>>> don't even see the icmp traffic on vnetX of the host.
>>>> >> >>>>>>>
>>>> >> >>>>>>> I've tried this twiddling the following options, but no
>>>> change in
>>>> >> >>>>>>> behavior:
>>>> >> >>>>>>>
>>>> >> >>>>>>> * Adding the following rule to nova-network node: iptables -A
>>>> >> >>>>>>> POSTROUTING -t mangle -p udp --dport bootpc -j CHECKSUM
>>>> >> >>>>>>> --checksum-fill
>>>> >> >>>>>>> * Adding the same rule to nova-compute node
>>>> >> >>>>>>> * Setting libvirt_use_virtio_for_bridge to "yes" and "no"
>>>> >> >>>>>>> (restarting
>>>> >> >>>>>>> nova-compute, re-launching instances)
>>>> >> >>>>>>> * With and without vhost_net loaded in nova-compute
>>>> (restarting
>>>> >> >>>>>>> nova-compute, re-launching instances)
>>>> >> >>>>>>> * Disabling iIpv6 inside of the CentOS guest
>>>> >> >>>>>>>
>>>> >> >>>>>>> Has anybody encountered this before?
>>>> >> >>>>>>>
>>>> >> >>>>>>> Lorin
>>>> >> >>>>>>>
>>>> >> >>>>>>> --
>>>> >> >>>>>>> Lorin Hochstein
>>>> >> >>>>>>> Lead Architect - Cloud Services
>>>> >> >>>>>>> Nimbis Services, Inc.
>>>> >> >>>>>>> www.nimbisservices.com
>>>> >> >>>>>>>
>>>> >> >>>>>>> _______________________________________________
>>>> >> >>>>>>> OpenStack-operators mailing list
>>>> >> >>>>>>> OpenStack-operators at lists.openstack.org
>>>> >> >>>>>>>
>>>> >> >>>>>>>
>>>> >> >>>>>>>
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>> >> >>>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>> --
>>>> >> >>>>>> Joe Topjian
>>>> >> >>>>>> Systems Administrator
>>>> >> >>>>>> Cybera Inc.
>>>> >> >>>>>>
>>>> >> >>>>>> www.cybera.ca
>>>> >> >>>>>>
>>>> >> >>>>>> Cybera is a not-for-profit organization that works to spur and
>>>> >> >>>>>> support
>>>> >> >>>>>> innovation, for the economic benefit of Alberta, through the
>>>> use of
>>>> >> >>>>>> cyberinfrastructure.
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>>
>>>> >> >>>>> --
>>>> >> >>>>> Lorin Hochstein
>>>> >> >>>>> Lead Architect - Cloud Services
>>>> >> >>>>> Nimbis Services, Inc.
>>>> >> >>>>> www.nimbisservices.com
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>>
>>>> >> >>>> --
>>>> >> >>>> Joe Topjian
>>>> >> >>>> Systems Administrator
>>>> >> >>>> Cybera Inc.
>>>> >> >>>>
>>>> >> >>>> www.cybera.ca
>>>> >> >>>>
>>>> >> >>>> Cybera is a not-for-profit organization that works to spur and
>>>> >> >>>> support
>>>> >> >>>> innovation, for the economic benefit of Alberta, through the
>>>> use of
>>>> >> >>>> cyberinfrastructure.
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Lorin Hochstein
>>>> >> >>> Lead Architect - Cloud Services
>>>> >> >>> Nimbis Services, Inc.
>>>> >> >>> www.nimbisservices.com
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Joe Topjian
>>>> >> >> Systems Administrator
>>>> >> >> Cybera Inc.
>>>> >> >>
>>>> >> >> www.cybera.ca
>>>> >> >>
>>>> >> >> Cybera is a not-for-profit organization that works to spur and
>>>> support
>>>> >> >> innovation, for the economic benefit of Alberta, through the use
>>>> of
>>>> >> >> cyberinfrastructure.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Lorin Hochstein
>>>> >> > Lead Architect - Cloud Services
>>>> >> > Nimbis Services, Inc.
>>>> >> > www.nimbisservices.com
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > OpenStack-operators mailing list
>>>> >> > OpenStack-operators at lists.openstack.org
>>>> >> >
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>> >> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Lorin Hochstein
>>>> > Lead Architect - Cloud Services
>>>> > Nimbis Services, Inc.
>>>> > www.nimbisservices.com
>>>>
>>>
>>>
>>>
>>> --
>>> Lorin Hochstein
>>> Lead Architect - Cloud Services
>>> Nimbis Services, Inc.
>>> www.nimbisservices.com
>>>
>>
>>
>>
>> --
>> Joe Topjian
>> Systems Administrator
>> Cybera Inc.
>>
>> www.cybera.ca
>>
>> Cybera is a not-for-profit organization that works to spur and support
>> innovation, for the economic benefit of Alberta, through the use
>> of cyberinfrastructure.
>>
>
>
>
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com
>



-- 
Joe Topjian
Systems Administrator
Cybera Inc.

www.cybera.ca

Cybera is a not-for-profit organization that works to spur and support
innovation, for the economic benefit of Alberta, through the use
of cyberinfrastructure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130408/daf330f7/attachment.html>


More information about the OpenStack-operators mailing list