[Openstack-operators] Networking breaks in CentOS guests but works with Ubuntu guests
    Lorin Hochstein 
    lorin at nimbisservices.com
       
    Mon Apr  8 18:18:38 UTC 2013
    
    
  
Joe:
I updated the gist, in all its ethernet-frame-detailed glory.
On Mon, Apr 8, 2013 at 2:02 PM, Joe Topjian <joe.topjian at cybera.ca> wrote:
> Thanks, Lorin. I hate to do this, but can you re-run with "-XX -vv -e" ?
>
>
> On Mon, Apr 8, 2013 at 11:36 AM, Lorin Hochstein <lorin at nimbisservices.com
> > wrote:
>
>> Joe:
>>
>> I grabbed 5 pickets ("tcpdump -i eth0 -XX -c5") with 8021q disabled and
>> enabled, I posted them to this gist:
>>
>> https://gist.github.com/lorin/5338623
>>
>>
>> Lorin
>>
>>
>> On Mon, Apr 8, 2013 at 12:27 PM, Joe Topjian <joe.topjian at cybera.ca>wrote:
>>
>>> Hi Lorin,
>>>
>>> Can you do "tcpdump -i eth0 -XX" on the guest with both the module on
>>> and off and post the output to the list or to me off-list?
>>>
>>> Thanks,
>>> Joe
>>>
>>>
>>>
>>> On Fri, Apr 5, 2013 at 11:23 AM, Lorin Hochstein <
>>> lorin at nimbisservices.com> wrote:
>>>
>>>> So, I made some progress here. I'm running FlatDHCP, so I don't have
>>>> any VLAN stuff configured on my nodes. I don't have the 8021q kernel module
>>>> loaded in any of my hosts or guests.
>>>>
>>>> If I load the 8021q kernel module in the CentOS guest, and I manually
>>>> put an IP address on the guest's eth0, then I can reach the guest from
>>>> other the cloud controller. Unfortunately, it still doesn't pick up an
>>>> address via DHCP: the DHCP replies still don't make it from host:vnet0 to
>>>>  guest:eth0, even though other packets are able to.
>>>>
>>>> Since the problem only occurs when the packet has to travel across the
>>>> network (if I put an IP on the bridge of the compute host, I can reach the
>>>> guest), it seems like the Cisco Nexus 3000 switch is putting VLAN tags in
>>>> the ethernet frame, and it's confusing the guest. But I can't figure out
>>>> why that would be. I've been trying to inspect the packets with tcpdump to
>>>> see if the vlan tags are there.
>>>>
>>>> I may end up just switching to VlanManager to make everything VLAN-y.
>>>>
>>>> Here's what my interfaces look like on the switch
>>>>
>>>> n3k-2# show interface switchport
>>>> Name: Ethernet1/1
>>>>   Switchport: Enabled
>>>>   Switchport Monitor: Not enabled
>>>>   Operational Mode: access
>>>>   Access Mode VLAN: 1 (default)
>>>>   Trunking Native Mode VLAN: 1 (default)
>>>>   Trunking VLANs Enabled: 1
>>>>   Administrative private-vlan primary host-association: none
>>>>   Administrative private-vlan secondary host-association: none
>>>>   Administrative private-vlan primary mapping: none
>>>>   Administrative private-vlan secondary mapping: none
>>>>   Administrative private-vlan trunk native VLAN: none
>>>>   Administrative private-vlan trunk encapsulation: dot1q
>>>>   Administrative private-vlan trunk normal VLANs: none
>>>>   Administrative private-vlan trunk private VLANs: none
>>>>   Operational private-vlan: none
>>>>   Unknown unicast blocked: disabled
>>>>    Unknown multicast blocked: disabled
>>>>
>>>> I don't have dot1q native tag enabled:
>>>>
>>>> n3k-2(config)# show vlan dot1q tag native
>>>> vlan dot1q native tag is disabled
>>>>
>>>>
>>>> Lorin
>>>>
>>>>
>>>> On Thu, Apr 4, 2013 at 4:27 PM, Narayan Desai <narayan.desai at gmail.com>wrote:
>>>>
>>>>> You might be hitting iptables/ebtables rules.
>>>>>
>>>>> I don't understand why this would be image specific though.
>>>>>
>>>>> Can you try generating traffic from the vm and see which counters
>>>>> increment? (with a static ip maybe?)
>>>>>  -nld
>>>>>
>>>>> On Thu, Apr 4, 2013 at 2:55 PM, Lorin Hochstein
>>>>> <lorin at nimbisservices.com> wrote:
>>>>> > Yeah, I've only loaded vhost_net on the compute host.
>>>>> >
>>>>> > I'm running CentOS 6.3 on my latest test, but I've tried with CentOS
>>>>> 6.4 as
>>>>> > well.
>>>>> >
>>>>> > I made some progress today (at least a potential workaround), but
>>>>> permit me
>>>>> > to ramble for a bit. I'm trying to run non-multihost. The eth1 on my
>>>>> compute
>>>>> > nodes are bridged to br100, and there's no IP address on br100 or
>>>>> eth1.
>>>>> >
>>>>> > Packets aren't getting into the VM from outside. If I manually put
>>>>> an IP
>>>>> > address on there and do an "arping" from the network node, the arp
>>>>> request
>>>>> > packets appear on vnet1 of the compute host but not on eth0 of the
>>>>> guest.
>>>>> > (Packets do leave, however, so I can do an arping from inside the
>>>>> guest and
>>>>> > the nova-network host will see the request. Similar to DHCP. It's
>>>>> like a
>>>>> > reverse black hole, things can only go out).
>>>>> >
>>>>> > However, if I put an IP address of br100 of the compute host, then
>>>>> the guest
>>>>> > can reach the host on that address.
>>>>> >
>>>>> > So, it looks like I'm going to have to switch to running multi-host
>>>>> to
>>>>> > resolve this issue, since the VM can communicate directly with a
>>>>> bridge on
>>>>> > the compute host if it has an IP.
>>>>> >
>>>>> > Still, it's puzzling to me, and I don't have a sense about how to
>>>>> debug this
>>>>> > further. How do I dig in if the problem is that packets can go from
>>>>> > guest:eth0 to host:vnet1, but they don't go from host:vnet1 to
>>>>> guest:eth0
>>>>> > (when they originate from a different server and travel over layer
>>>>> 2), and
>>>>> > only with a specific image that works for other people?
>>>>> >
>>>>> > Lorin
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, Apr 4, 2013 at 11:33 AM, Narayan Desai <
>>>>> narayan.desai at gmail.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> iirc, vhost_net is only needed on the host.
>>>>> >>
>>>>> >> We have seen stability issues with 12.04 (only on particular host
>>>>> >> types) when using virtio without vhost_net. Enabling vhost_net on
>>>>> the
>>>>> >> host resolved the issues for us.
>>>>> >>
>>>>> >> Which version of Centos are you running?
>>>>> >>  -nld
>>>>> >>
>>>>> >> On Wed, Apr 3, 2013 at 3:59 PM, Lorin Hochstein
>>>>> >> <lorin at nimbisservices.com> wrote:
>>>>> >> > That was my instinct, but I've tried it both ways (toggling
>>>>> >> > libvirt_use_virtio_for_bridge, restarting nova-compute, launching
>>>>> new
>>>>> >> > instance), and vnc'd into the instance to confirmed that in one
>>>>> case the
>>>>> >> > virtio_net drivers were loaded, and in another case, they
>>>>> weren't, and
>>>>> >> > the
>>>>> >> > result was the same. But it doesn't seem to be related. It's
>>>>> really
>>>>> >> > baffling.
>>>>> >> >
>>>>> >> > Lorin
>>>>> >> >
>>>>> >> >
>>>>> >> > On Wed, Apr 3, 2013 at 4:47 PM, Joe Topjian <
>>>>> joe.topjian at cybera.ca>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> That's really bizarre -- especially since it's only CentOS
>>>>> images. Do
>>>>> >> >> you
>>>>> >> >> think it might be something with virtio compatibility?
>>>>> >> >>
>>>>> >> >> I'm hesitant to lean on it being a compute/controller issue
>>>>> since other
>>>>> >> >> images work.
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> On Wed, Apr 3, 2013 at 2:41 PM, Lorin Hochstein
>>>>> >> >> <lorin at nimbisservices.com>
>>>>> >> >> wrote:
>>>>> >> >>>
>>>>> >> >>> I've tested with multiple ones, including the CentOS6 image
>>>>> from that
>>>>> >> >>> page, as well as several we have rolled on our own.
>>>>> >> >>>
>>>>> >> >>> Right now I'm testing by manually putting on the IP by doing:
>>>>> >> >>>
>>>>> >> >>> ip addr add 10.40.0.4/16 broadcast 10.40.255.255 dev eth0
>>>>> >> >>>
>>>>> >> >>> I can't ping out at all. If I try to arping out, and then
>>>>> tcpdump,
>>>>> >> >>> just
>>>>> >> >>> like in the DHCP case, I can see the ARP request and replies on
>>>>> vnet0
>>>>> >> >>> of the
>>>>> >> >>> host:
>>>>> >> >>>
>>>>> >> >>> root at c220-2:~# tcpdump -i vnet0 arp
>>>>> >> >>> tcpdump: WARNING: vnet0: no IPv4 address assigned
>>>>> >> >>> tcpdump: verbose output suppressed, use -v or -vv for full
>>>>> protocol
>>>>> >> >>> decode
>>>>> >> >>> 16:34:42.109067 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>>>> >> >>> 10.40.0.4, length 28
>>>>> >> >>> 16:34:42.109085 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>>>>> >> >>> 10.40.0.4, length 28
>>>>> >> >>> 16:34:42.109216 ARP, Reply 10.40.0.1 is-at 54:78:1a:86:50:c9
>>>>> (oui
>>>>> >> >>> Unknown), length 46
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> But if I tcpdump on eth0 in the guest, I only see the arp
>>>>> requests,
>>>>> >> >>> not
>>>>> >> >>> the replies..
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> Lorin
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> On Wed, Apr 3, 2013 at 4:26 PM, Joe Topjian <
>>>>> joe.topjian at cybera.ca>
>>>>> >> >>> wrote:
>>>>> >> >>>>
>>>>> >> >>>> What CentOS images are you using? These have worked for me:
>>>>> >> >>>>
>>>>> >> >>>> https://github.com/rackerjoe/oz-image-build
>>>>> >> >>>>
>>>>> >> >>>>
>>>>> >> >>>> On Wed, Apr 3, 2013 at 2:13 PM, Lorin Hochstein
>>>>> >> >>>> <lorin at nimbisservices.com> wrote:
>>>>> >> >>>>>
>>>>> >> >>>>> Hi Joe:
>>>>> >> >>>>>
>>>>> >> >>>>> It happens immediately thereafter. CentOS images have never
>>>>> worked
>>>>> >> >>>>> on
>>>>> >> >>>>> our setup.
>>>>> >> >>>>>
>>>>> >> >>>>> Lorin
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>> On Wed, Apr 3, 2013 at 3:30 PM, Joe Topjian <
>>>>> joe.topjian at cybera.ca>
>>>>> >> >>>>> wrote:
>>>>> >> >>>>>>
>>>>> >> >>>>>> Hi Lorin,
>>>>> >> >>>>>>
>>>>> >> >>>>>> Does this happen shortly after the guests were created? Or
>>>>> usually
>>>>> >> >>>>>> a
>>>>> >> >>>>>> few hours/days later? If the latter, are these guests seeing
>>>>> large
>>>>> >> >>>>>> amounts
>>>>> >> >>>>>> of bandwidth?
>>>>> >> >>>>>>
>>>>> >> >>>>>> Thanks,
>>>>> >> >>>>>> Joe
>>>>> >> >>>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Lorin Hochstein
>>>>> >> >>>>>> <lorin at nimbisservices.com> wrote:
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> Hi all:
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> I'm having a strange issue where networking on my CentOS
>>>>> guests
>>>>> >> >>>>>>> isn't
>>>>> >> >>>>>>> working properly, but things are working fine with my Ubuntu
>>>>> >> >>>>>>> guests.
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> I'm running Folsom on Ubuntu 12.04, nova-network, not
>>>>> multi-host.
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> The first symptom is that CentOS instances don't get IP
>>>>> addresses
>>>>> >> >>>>>>> via
>>>>> >> >>>>>>> DHCP. If I trace the DHCP requests and replies using
>>>>> tcpdump, I
>>>>> >> >>>>>>> can see the
>>>>> >> >>>>>>> reply from dnsmasq reach the vnetX interface of the compute
>>>>> host,
>>>>> >> >>>>>>> but it
>>>>> >> >>>>>>> doesn't get to the eth0 interface of the compute host. (I'm
>>>>> at a
>>>>> >> >>>>>>> loss here
>>>>> >> >>>>>>> about how to debug something like that).
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> If I try to statically configure an IP address on the guest
>>>>> >> >>>>>>> instead,
>>>>> >> >>>>>>> networking still doesn't work. I can't ping anything on the
>>>>> >> >>>>>>> subnet, and I
>>>>> >> >>>>>>> don't even see the icmp traffic on vnetX of the host.
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> I've tried this twiddling the following options, but no
>>>>> change in
>>>>> >> >>>>>>> behavior:
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> * Adding the following rule to nova-network node: iptables
>>>>> -A
>>>>> >> >>>>>>> POSTROUTING -t mangle -p udp --dport bootpc -j CHECKSUM
>>>>> >> >>>>>>> --checksum-fill
>>>>> >> >>>>>>> * Adding the same rule to nova-compute node
>>>>> >> >>>>>>> * Setting libvirt_use_virtio_for_bridge to "yes" and "no"
>>>>> >> >>>>>>> (restarting
>>>>> >> >>>>>>> nova-compute, re-launching instances)
>>>>> >> >>>>>>> * With and without vhost_net loaded in nova-compute
>>>>> (restarting
>>>>> >> >>>>>>> nova-compute, re-launching instances)
>>>>> >> >>>>>>> * Disabling iIpv6 inside of the CentOS guest
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> Has anybody encountered this before?
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> Lorin
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> --
>>>>> >> >>>>>>> Lorin Hochstein
>>>>> >> >>>>>>> Lead Architect - Cloud Services
>>>>> >> >>>>>>> Nimbis Services, Inc.
>>>>> >> >>>>>>> www.nimbisservices.com
>>>>> >> >>>>>>>
>>>>> >> >>>>>>> _______________________________________________
>>>>> >> >>>>>>> OpenStack-operators mailing list
>>>>> >> >>>>>>> OpenStack-operators at lists.openstack.org
>>>>> >> >>>>>>>
>>>>> >> >>>>>>>
>>>>> >> >>>>>>>
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>> >> >>>>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>>
>>>>> >> >>>>>> --
>>>>> >> >>>>>> Joe Topjian
>>>>> >> >>>>>> Systems Administrator
>>>>> >> >>>>>> Cybera Inc.
>>>>> >> >>>>>>
>>>>> >> >>>>>> www.cybera.ca
>>>>> >> >>>>>>
>>>>> >> >>>>>> Cybera is a not-for-profit organization that works to spur
>>>>> and
>>>>> >> >>>>>> support
>>>>> >> >>>>>> innovation, for the economic benefit of Alberta, through the
>>>>> use of
>>>>> >> >>>>>> cyberinfrastructure.
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>>
>>>>> >> >>>>> --
>>>>> >> >>>>> Lorin Hochstein
>>>>> >> >>>>> Lead Architect - Cloud Services
>>>>> >> >>>>> Nimbis Services, Inc.
>>>>> >> >>>>> www.nimbisservices.com
>>>>> >> >>>>
>>>>> >> >>>>
>>>>> >> >>>>
>>>>> >> >>>>
>>>>> >> >>>> --
>>>>> >> >>>> Joe Topjian
>>>>> >> >>>> Systems Administrator
>>>>> >> >>>> Cybera Inc.
>>>>> >> >>>>
>>>>> >> >>>> www.cybera.ca
>>>>> >> >>>>
>>>>> >> >>>> Cybera is a not-for-profit organization that works to spur and
>>>>> >> >>>> support
>>>>> >> >>>> innovation, for the economic benefit of Alberta, through the
>>>>> use of
>>>>> >> >>>> cyberinfrastructure.
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Lorin Hochstein
>>>>> >> >>> Lead Architect - Cloud Services
>>>>> >> >>> Nimbis Services, Inc.
>>>>> >> >>> www.nimbisservices.com
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Joe Topjian
>>>>> >> >> Systems Administrator
>>>>> >> >> Cybera Inc.
>>>>> >> >>
>>>>> >> >> www.cybera.ca
>>>>> >> >>
>>>>> >> >> Cybera is a not-for-profit organization that works to spur and
>>>>> support
>>>>> >> >> innovation, for the economic benefit of Alberta, through the use
>>>>> of
>>>>> >> >> cyberinfrastructure.
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > --
>>>>> >> > Lorin Hochstein
>>>>> >> > Lead Architect - Cloud Services
>>>>> >> > Nimbis Services, Inc.
>>>>> >> > www.nimbisservices.com
>>>>> >> >
>>>>> >> > _______________________________________________
>>>>> >> > OpenStack-operators mailing list
>>>>> >> > OpenStack-operators at lists.openstack.org
>>>>> >> >
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>> >> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Lorin Hochstein
>>>>> > Lead Architect - Cloud Services
>>>>> > Nimbis Services, Inc.
>>>>> > www.nimbisservices.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lorin Hochstein
>>>> Lead Architect - Cloud Services
>>>> Nimbis Services, Inc.
>>>> www.nimbisservices.com
>>>>
>>>
>>>
>>>
>>> --
>>> Joe Topjian
>>> Systems Administrator
>>> Cybera Inc.
>>>
>>> www.cybera.ca
>>>
>>> Cybera is a not-for-profit organization that works to spur and support
>>> innovation, for the economic benefit of Alberta, through the use
>>> of cyberinfrastructure.
>>>
>>
>>
>>
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>
>
>
> --
> Joe Topjian
> Systems Administrator
> Cybera Inc.
>
> www.cybera.ca
>
> Cybera is a not-for-profit organization that works to spur and support
> innovation, for the economic benefit of Alberta, through the use
> of cyberinfrastructure.
>
-- 
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130408/66af7e30/attachment-0001.html>
    
    
More information about the OpenStack-operators
mailing list