[Openstack-operators] Networking breaks in CentOS guests but works with Ubuntu guests

Joe Topjian joe.topjian at cybera.ca
Mon Apr 8 16:27:03 UTC 2013


Hi Lorin,

Can you do "tcpdump -i eth0 -XX" on the guest with both the module on and
off and post the output to the list or to me off-list?

Thanks,
Joe



On Fri, Apr 5, 2013 at 11:23 AM, Lorin Hochstein
<lorin at nimbisservices.com>wrote:

> So, I made some progress here. I'm running FlatDHCP, so I don't have any
> VLAN stuff configured on my nodes. I don't have the 8021q kernel module
> loaded in any of my hosts or guests.
>
> If I load the 8021q kernel module in the CentOS guest, and I manually put
> an IP address on the guest's eth0, then I can reach the guest from other
> the cloud controller. Unfortunately, it still doesn't pick up an address
> via DHCP: the DHCP replies still don't make it from host:vnet0 to
>  guest:eth0, even though other packets are able to.
>
> Since the problem only occurs when the packet has to travel across the
> network (if I put an IP on the bridge of the compute host, I can reach the
> guest), it seems like the Cisco Nexus 3000 switch is putting VLAN tags in
> the ethernet frame, and it's confusing the guest. But I can't figure out
> why that would be. I've been trying to inspect the packets with tcpdump to
> see if the vlan tags are there.
>
> I may end up just switching to VlanManager to make everything VLAN-y.
>
> Here's what my interfaces look like on the switch
>
> n3k-2# show interface switchport
> Name: Ethernet1/1
>   Switchport: Enabled
>   Switchport Monitor: Not enabled
>   Operational Mode: access
>   Access Mode VLAN: 1 (default)
>   Trunking Native Mode VLAN: 1 (default)
>   Trunking VLANs Enabled: 1
>   Administrative private-vlan primary host-association: none
>   Administrative private-vlan secondary host-association: none
>   Administrative private-vlan primary mapping: none
>   Administrative private-vlan secondary mapping: none
>   Administrative private-vlan trunk native VLAN: none
>   Administrative private-vlan trunk encapsulation: dot1q
>   Administrative private-vlan trunk normal VLANs: none
>   Administrative private-vlan trunk private VLANs: none
>   Operational private-vlan: none
>   Unknown unicast blocked: disabled
>   Unknown multicast blocked: disabled
>
> I don't have dot1q native tag enabled:
>
> n3k-2(config)# show vlan dot1q tag native
> vlan dot1q native tag is disabled
>
>
> Lorin
>
>
> On Thu, Apr 4, 2013 at 4:27 PM, Narayan Desai <narayan.desai at gmail.com>wrote:
>
>> You might be hitting iptables/ebtables rules.
>>
>> I don't understand why this would be image specific though.
>>
>> Can you try generating traffic from the vm and see which counters
>> increment? (with a static ip maybe?)
>>  -nld
>>
>> On Thu, Apr 4, 2013 at 2:55 PM, Lorin Hochstein
>> <lorin at nimbisservices.com> wrote:
>> > Yeah, I've only loaded vhost_net on the compute host.
>> >
>> > I'm running CentOS 6.3 on my latest test, but I've tried with CentOS
>> 6.4 as
>> > well.
>> >
>> > I made some progress today (at least a potential workaround), but
>> permit me
>> > to ramble for a bit. I'm trying to run non-multihost. The eth1 on my
>> compute
>> > nodes are bridged to br100, and there's no IP address on br100 or eth1.
>> >
>> > Packets aren't getting into the VM from outside. If I manually put an IP
>> > address on there and do an "arping" from the network node, the arp
>> request
>> > packets appear on vnet1 of the compute host but not on eth0 of the
>> guest.
>> > (Packets do leave, however, so I can do an arping from inside the guest
>> and
>> > the nova-network host will see the request. Similar to DHCP. It's like a
>> > reverse black hole, things can only go out).
>> >
>> > However, if I put an IP address of br100 of the compute host, then the
>> guest
>> > can reach the host on that address.
>> >
>> > So, it looks like I'm going to have to switch to running multi-host to
>> > resolve this issue, since the VM can communicate directly with a bridge
>> on
>> > the compute host if it has an IP.
>> >
>> > Still, it's puzzling to me, and I don't have a sense about how to debug
>> this
>> > further. How do I dig in if the problem is that packets can go from
>> > guest:eth0 to host:vnet1, but they don't go from host:vnet1 to
>> guest:eth0
>> > (when they originate from a different server and travel over layer 2),
>> and
>> > only with a specific image that works for other people?
>> >
>> > Lorin
>> >
>> >
>> >
>> > On Thu, Apr 4, 2013 at 11:33 AM, Narayan Desai <narayan.desai at gmail.com
>> >
>> > wrote:
>> >>
>> >> iirc, vhost_net is only needed on the host.
>> >>
>> >> We have seen stability issues with 12.04 (only on particular host
>> >> types) when using virtio without vhost_net. Enabling vhost_net on the
>> >> host resolved the issues for us.
>> >>
>> >> Which version of Centos are you running?
>> >>  -nld
>> >>
>> >> On Wed, Apr 3, 2013 at 3:59 PM, Lorin Hochstein
>> >> <lorin at nimbisservices.com> wrote:
>> >> > That was my instinct, but I've tried it both ways (toggling
>> >> > libvirt_use_virtio_for_bridge, restarting nova-compute, launching new
>> >> > instance), and vnc'd into the instance to confirmed that in one case
>> the
>> >> > virtio_net drivers were loaded, and in another case, they weren't,
>> and
>> >> > the
>> >> > result was the same. But it doesn't seem to be related. It's really
>> >> > baffling.
>> >> >
>> >> > Lorin
>> >> >
>> >> >
>> >> > On Wed, Apr 3, 2013 at 4:47 PM, Joe Topjian <joe.topjian at cybera.ca>
>> >> > wrote:
>> >> >>
>> >> >> That's really bizarre -- especially since it's only CentOS images.
>> Do
>> >> >> you
>> >> >> think it might be something with virtio compatibility?
>> >> >>
>> >> >> I'm hesitant to lean on it being a compute/controller issue since
>> other
>> >> >> images work.
>> >> >>
>> >> >>
>> >> >> On Wed, Apr 3, 2013 at 2:41 PM, Lorin Hochstein
>> >> >> <lorin at nimbisservices.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> I've tested with multiple ones, including the CentOS6 image from
>> that
>> >> >>> page, as well as several we have rolled on our own.
>> >> >>>
>> >> >>> Right now I'm testing by manually putting on the IP by doing:
>> >> >>>
>> >> >>> ip addr add 10.40.0.4/16 broadcast 10.40.255.255 dev eth0
>> >> >>>
>> >> >>> I can't ping out at all. If I try to arping out, and then tcpdump,
>> >> >>> just
>> >> >>> like in the DHCP case, I can see the ARP request and replies on
>> vnet0
>> >> >>> of the
>> >> >>> host:
>> >> >>>
>> >> >>> root at c220-2:~# tcpdump -i vnet0 arp
>> >> >>> tcpdump: WARNING: vnet0: no IPv4 address assigned
>> >> >>> tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> >> >>> decode
>> >> >>> 16:34:42.109067 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>> >> >>> 10.40.0.4, length 28
>> >> >>> 16:34:42.109085 ARP, Request who-has 10.40.0.1 (Broadcast) tell
>> >> >>> 10.40.0.4, length 28
>> >> >>> 16:34:42.109216 ARP, Reply 10.40.0.1 is-at 54:78:1a:86:50:c9 (oui
>> >> >>> Unknown), length 46
>> >> >>>
>> >> >>>
>> >> >>> But if I tcpdump on eth0 in the guest, I only see the arp requests,
>> >> >>> not
>> >> >>> the replies..
>> >> >>>
>> >> >>>
>> >> >>> Lorin
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Apr 3, 2013 at 4:26 PM, Joe Topjian <joe.topjian at cybera.ca
>> >
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> What CentOS images are you using? These have worked for me:
>> >> >>>>
>> >> >>>> https://github.com/rackerjoe/oz-image-build
>> >> >>>>
>> >> >>>>
>> >> >>>> On Wed, Apr 3, 2013 at 2:13 PM, Lorin Hochstein
>> >> >>>> <lorin at nimbisservices.com> wrote:
>> >> >>>>>
>> >> >>>>> Hi Joe:
>> >> >>>>>
>> >> >>>>> It happens immediately thereafter. CentOS images have never
>> worked
>> >> >>>>> on
>> >> >>>>> our setup.
>> >> >>>>>
>> >> >>>>> Lorin
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Wed, Apr 3, 2013 at 3:30 PM, Joe Topjian <
>> joe.topjian at cybera.ca>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> Hi Lorin,
>> >> >>>>>>
>> >> >>>>>> Does this happen shortly after the guests were created? Or
>> usually
>> >> >>>>>> a
>> >> >>>>>> few hours/days later? If the latter, are these guests seeing
>> large
>> >> >>>>>> amounts
>> >> >>>>>> of bandwidth?
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>> Joe
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Lorin Hochstein
>> >> >>>>>> <lorin at nimbisservices.com> wrote:
>> >> >>>>>>>
>> >> >>>>>>> Hi all:
>> >> >>>>>>>
>> >> >>>>>>> I'm having a strange issue where networking on my CentOS guests
>> >> >>>>>>> isn't
>> >> >>>>>>> working properly, but things are working fine with my Ubuntu
>> >> >>>>>>> guests.
>> >> >>>>>>>
>> >> >>>>>>> I'm running Folsom on Ubuntu 12.04, nova-network, not
>> multi-host.
>> >> >>>>>>>
>> >> >>>>>>> The first symptom is that CentOS instances don't get IP
>> addresses
>> >> >>>>>>> via
>> >> >>>>>>> DHCP. If I trace the DHCP requests and replies using tcpdump, I
>> >> >>>>>>> can see the
>> >> >>>>>>> reply from dnsmasq reach the vnetX interface of the compute
>> host,
>> >> >>>>>>> but it
>> >> >>>>>>> doesn't get to the eth0 interface of the compute host. (I'm at
>> a
>> >> >>>>>>> loss here
>> >> >>>>>>> about how to debug something like that).
>> >> >>>>>>>
>> >> >>>>>>> If I try to statically configure an IP address on the guest
>> >> >>>>>>> instead,
>> >> >>>>>>> networking still doesn't work. I can't ping anything on the
>> >> >>>>>>> subnet, and I
>> >> >>>>>>> don't even see the icmp traffic on vnetX of the host.
>> >> >>>>>>>
>> >> >>>>>>> I've tried this twiddling the following options, but no change
>> in
>> >> >>>>>>> behavior:
>> >> >>>>>>>
>> >> >>>>>>> * Adding the following rule to nova-network node: iptables -A
>> >> >>>>>>> POSTROUTING -t mangle -p udp --dport bootpc -j CHECKSUM
>> >> >>>>>>> --checksum-fill
>> >> >>>>>>> * Adding the same rule to nova-compute node
>> >> >>>>>>> * Setting libvirt_use_virtio_for_bridge to "yes" and "no"
>> >> >>>>>>> (restarting
>> >> >>>>>>> nova-compute, re-launching instances)
>> >> >>>>>>> * With and without vhost_net loaded in nova-compute (restarting
>> >> >>>>>>> nova-compute, re-launching instances)
>> >> >>>>>>> * Disabling iIpv6 inside of the CentOS guest
>> >> >>>>>>>
>> >> >>>>>>> Has anybody encountered this before?
>> >> >>>>>>>
>> >> >>>>>>> Lorin
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> Lorin Hochstein
>> >> >>>>>>> Lead Architect - Cloud Services
>> >> >>>>>>> Nimbis Services, Inc.
>> >> >>>>>>> www.nimbisservices.com
>> >> >>>>>>>
>> >> >>>>>>> _______________________________________________
>> >> >>>>>>> OpenStack-operators mailing list
>> >> >>>>>>> OpenStack-operators at lists.openstack.org
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> >> >>>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> --
>> >> >>>>>> Joe Topjian
>> >> >>>>>> Systems Administrator
>> >> >>>>>> Cybera Inc.
>> >> >>>>>>
>> >> >>>>>> www.cybera.ca
>> >> >>>>>>
>> >> >>>>>> Cybera is a not-for-profit organization that works to spur and
>> >> >>>>>> support
>> >> >>>>>> innovation, for the economic benefit of Alberta, through the
>> use of
>> >> >>>>>> cyberinfrastructure.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Lorin Hochstein
>> >> >>>>> Lead Architect - Cloud Services
>> >> >>>>> Nimbis Services, Inc.
>> >> >>>>> www.nimbisservices.com
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Joe Topjian
>> >> >>>> Systems Administrator
>> >> >>>> Cybera Inc.
>> >> >>>>
>> >> >>>> www.cybera.ca
>> >> >>>>
>> >> >>>> Cybera is a not-for-profit organization that works to spur and
>> >> >>>> support
>> >> >>>> innovation, for the economic benefit of Alberta, through the use
>> of
>> >> >>>> cyberinfrastructure.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Lorin Hochstein
>> >> >>> Lead Architect - Cloud Services
>> >> >>> Nimbis Services, Inc.
>> >> >>> www.nimbisservices.com
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Joe Topjian
>> >> >> Systems Administrator
>> >> >> Cybera Inc.
>> >> >>
>> >> >> www.cybera.ca
>> >> >>
>> >> >> Cybera is a not-for-profit organization that works to spur and
>> support
>> >> >> innovation, for the economic benefit of Alberta, through the use of
>> >> >> cyberinfrastructure.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Lorin Hochstein
>> >> > Lead Architect - Cloud Services
>> >> > Nimbis Services, Inc.
>> >> > www.nimbisservices.com
>> >> >
>> >> > _______________________________________________
>> >> > OpenStack-operators mailing list
>> >> > OpenStack-operators at lists.openstack.org
>> >> >
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Lorin Hochstein
>> > Lead Architect - Cloud Services
>> > Nimbis Services, Inc.
>> > www.nimbisservices.com
>>
>
>
>
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com
>



-- 
Joe Topjian
Systems Administrator
Cybera Inc.

www.cybera.ca

Cybera is a not-for-profit organization that works to spur and support
innovation, for the economic benefit of Alberta, through the use
of cyberinfrastructure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130408/82ee9828/attachment.html>


More information about the OpenStack-operators mailing list