[Openstack-operators] Networking breaks in CentOS guests but works with Ubuntu guests

Lorin Hochstein lorin at nimbisservices.com
Fri Apr 5 17:23:03 UTC 2013


So, I made some progress here. I'm running FlatDHCP, so I don't have any
VLAN stuff configured on my nodes. I don't have the 8021q kernel module
loaded in any of my hosts or guests.

If I load the 8021q kernel module in the CentOS guest, and I manually put
an IP address on the guest's eth0, then I can reach the guest from other
the cloud controller. Unfortunately, it still doesn't pick up an address
via DHCP: the DHCP replies still don't make it from host:vnet0 to
 guest:eth0, even though other packets are able to.

Since the problem only occurs when the packet has to travel across the
network (if I put an IP on the bridge of the compute host, I can reach the
guest), it seems like the Cisco Nexus 3000 switch is putting VLAN tags in
the ethernet frame, and it's confusing the guest. But I can't figure out
why that would be. I've been trying to inspect the packets with tcpdump to
see if the vlan tags are there.

I may end up just switching to VlanManager to make everything VLAN-y.

Here's what my interfaces look like on the switch

n3k-2# show interface switchport
Name: Ethernet1/1
  Switchport: Enabled
  Switchport Monitor: Not enabled
  Operational Mode: access
  Access Mode VLAN: 1 (default)
  Trunking Native Mode VLAN: 1 (default)
  Trunking VLANs Enabled: 1
  Administrative private-vlan primary host-association: none
  Administrative private-vlan secondary host-association: none
  Administrative private-vlan primary mapping: none
  Administrative private-vlan secondary mapping: none
  Administrative private-vlan trunk native VLAN: none
  Administrative private-vlan trunk encapsulation: dot1q
  Administrative private-vlan trunk normal VLANs: none
  Administrative private-vlan trunk private VLANs: none
  Operational private-vlan: none
  Unknown unicast blocked: disabled
  Unknown multicast blocked: disabled

I don't have dot1q native tag enabled:

n3k-2(config)# show vlan dot1q tag native
vlan dot1q native tag is disabled


Lorin


On Thu, Apr 4, 2013 at 4:27 PM, Narayan Desai <narayan.desai at gmail.com>wrote:

> You might be hitting iptables/ebtables rules.
>
> I don't understand why this would be image specific though.
>
> Can you try generating traffic from the vm and see which counters
> increment? (with a static ip maybe?)
>  -nld
>
> On Thu, Apr 4, 2013 at 2:55 PM, Lorin Hochstein
> <lorin at nimbisservices.com> wrote:
> > Yeah, I've only loaded vhost_net on the compute host.
> >
> > I'm running CentOS 6.3 on my latest test, but I've tried with CentOS 6.4
> as
> > well.
> >
> > I made some progress today (at least a potential workaround), but permit
> me
> > to ramble for a bit. I'm trying to run non-multihost. The eth1 on my
> compute
> > nodes are bridged to br100, and there's no IP address on br100 or eth1.
> >
> > Packets aren't getting into the VM from outside. If I manually put an IP
> > address on there and do an "arping" from the network node, the arp
> request
> > packets appear on vnet1 of the compute host but not on eth0 of the guest.
> > (Packets do leave, however, so I can do an arping from inside the guest
> and
> > the nova-network host will see the request. Similar to DHCP. It's like a
> > reverse black hole, things can only go out).
> >
> > However, if I put an IP address of br100 of the compute host, then the
> guest
> > can reach the host on that address.
> >
> > So, it looks like I'm going to have to switch to running multi-host to
> > resolve this issue, since the VM can communicate directly with a bridge
> on
> > the compute host if it has an IP.
> >
> > Still, it's puzzling to me, and I don't have a sense about how to debug
> this
> > further. How do I dig in if the problem is that packets can go from
> > guest:eth0 to host:vnet1, but they don't go from host:vnet1 to guest:eth0
> > (when they originate from a different server and travel over layer 2),
> and
> > only with a specific image that works for other people?
> >
> > Lorin
> >
> >
> >
> > On Thu, Apr 4, 2013 at 11:33 AM, Narayan Desai <narayan.desai at gmail.com>
> > wrote:
> >>
> >> iirc, vhost_net is only needed on the host.
> >>
> >> We have seen stability issues with 12.04 (only on particular host
> >> types) when using virtio without vhost_net. Enabling vhost_net on the
> >> host resolved the issues for us.
> >>
> >> Which version of Centos are you running?
> >>  -nld
> >>
> >> On Wed, Apr 3, 2013 at 3:59 PM, Lorin Hochstein
> >> <lorin at nimbisservices.com> wrote:
> >> > That was my instinct, but I've tried it both ways (toggling
> >> > libvirt_use_virtio_for_bridge, restarting nova-compute, launching new
> >> > instance), and vnc'd into the instance to confirmed that in one case
> the
> >> > virtio_net drivers were loaded, and in another case, they weren't, and
> >> > the
> >> > result was the same. But it doesn't seem to be related. It's really
> >> > baffling.
> >> >
> >> > Lorin
> >> >
> >> >
> >> > On Wed, Apr 3, 2013 at 4:47 PM, Joe Topjian <joe.topjian at cybera.ca>
> >> > wrote:
> >> >>
> >> >> That's really bizarre -- especially since it's only CentOS images. Do
> >> >> you
> >> >> think it might be something with virtio compatibility?
> >> >>
> >> >> I'm hesitant to lean on it being a compute/controller issue since
> other
> >> >> images work.
> >> >>
> >> >>
> >> >> On Wed, Apr 3, 2013 at 2:41 PM, Lorin Hochstein
> >> >> <lorin at nimbisservices.com>
> >> >> wrote:
> >> >>>
> >> >>> I've tested with multiple ones, including the CentOS6 image from
> that
> >> >>> page, as well as several we have rolled on our own.
> >> >>>
> >> >>> Right now I'm testing by manually putting on the IP by doing:
> >> >>>
> >> >>> ip addr add 10.40.0.4/16 broadcast 10.40.255.255 dev eth0
> >> >>>
> >> >>> I can't ping out at all. If I try to arping out, and then tcpdump,
> >> >>> just
> >> >>> like in the DHCP case, I can see the ARP request and replies on
> vnet0
> >> >>> of the
> >> >>> host:
> >> >>>
> >> >>> root at c220-2:~# tcpdump -i vnet0 arp
> >> >>> tcpdump: WARNING: vnet0: no IPv4 address assigned
> >> >>> tcpdump: verbose output suppressed, use -v or -vv for full protocol
> >> >>> decode
> >> >>> 16:34:42.109067 ARP, Request who-has 10.40.0.1 (Broadcast) tell
> >> >>> 10.40.0.4, length 28
> >> >>> 16:34:42.109085 ARP, Request who-has 10.40.0.1 (Broadcast) tell
> >> >>> 10.40.0.4, length 28
> >> >>> 16:34:42.109216 ARP, Reply 10.40.0.1 is-at 54:78:1a:86:50:c9 (oui
> >> >>> Unknown), length 46
> >> >>>
> >> >>>
> >> >>> But if I tcpdump on eth0 in the guest, I only see the arp requests,
> >> >>> not
> >> >>> the replies..
> >> >>>
> >> >>>
> >> >>> Lorin
> >> >>>
> >> >>>
> >> >>> On Wed, Apr 3, 2013 at 4:26 PM, Joe Topjian <joe.topjian at cybera.ca>
> >> >>> wrote:
> >> >>>>
> >> >>>> What CentOS images are you using? These have worked for me:
> >> >>>>
> >> >>>> https://github.com/rackerjoe/oz-image-build
> >> >>>>
> >> >>>>
> >> >>>> On Wed, Apr 3, 2013 at 2:13 PM, Lorin Hochstein
> >> >>>> <lorin at nimbisservices.com> wrote:
> >> >>>>>
> >> >>>>> Hi Joe:
> >> >>>>>
> >> >>>>> It happens immediately thereafter. CentOS images have never worked
> >> >>>>> on
> >> >>>>> our setup.
> >> >>>>>
> >> >>>>> Lorin
> >> >>>>>
> >> >>>>>
> >> >>>>> On Wed, Apr 3, 2013 at 3:30 PM, Joe Topjian <
> joe.topjian at cybera.ca>
> >> >>>>> wrote:
> >> >>>>>>
> >> >>>>>> Hi Lorin,
> >> >>>>>>
> >> >>>>>> Does this happen shortly after the guests were created? Or
> usually
> >> >>>>>> a
> >> >>>>>> few hours/days later? If the latter, are these guests seeing
> large
> >> >>>>>> amounts
> >> >>>>>> of bandwidth?
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Joe
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Lorin Hochstein
> >> >>>>>> <lorin at nimbisservices.com> wrote:
> >> >>>>>>>
> >> >>>>>>> Hi all:
> >> >>>>>>>
> >> >>>>>>> I'm having a strange issue where networking on my CentOS guests
> >> >>>>>>> isn't
> >> >>>>>>> working properly, but things are working fine with my Ubuntu
> >> >>>>>>> guests.
> >> >>>>>>>
> >> >>>>>>> I'm running Folsom on Ubuntu 12.04, nova-network, not
> multi-host.
> >> >>>>>>>
> >> >>>>>>> The first symptom is that CentOS instances don't get IP
> addresses
> >> >>>>>>> via
> >> >>>>>>> DHCP. If I trace the DHCP requests and replies using tcpdump, I
> >> >>>>>>> can see the
> >> >>>>>>> reply from dnsmasq reach the vnetX interface of the compute
> host,
> >> >>>>>>> but it
> >> >>>>>>> doesn't get to the eth0 interface of the compute host. (I'm at a
> >> >>>>>>> loss here
> >> >>>>>>> about how to debug something like that).
> >> >>>>>>>
> >> >>>>>>> If I try to statically configure an IP address on the guest
> >> >>>>>>> instead,
> >> >>>>>>> networking still doesn't work. I can't ping anything on the
> >> >>>>>>> subnet, and I
> >> >>>>>>> don't even see the icmp traffic on vnetX of the host.
> >> >>>>>>>
> >> >>>>>>> I've tried this twiddling the following options, but no change
> in
> >> >>>>>>> behavior:
> >> >>>>>>>
> >> >>>>>>> * Adding the following rule to nova-network node: iptables -A
> >> >>>>>>> POSTROUTING -t mangle -p udp --dport bootpc -j CHECKSUM
> >> >>>>>>> --checksum-fill
> >> >>>>>>> * Adding the same rule to nova-compute node
> >> >>>>>>> * Setting libvirt_use_virtio_for_bridge to "yes" and "no"
> >> >>>>>>> (restarting
> >> >>>>>>> nova-compute, re-launching instances)
> >> >>>>>>> * With and without vhost_net loaded in nova-compute (restarting
> >> >>>>>>> nova-compute, re-launching instances)
> >> >>>>>>> * Disabling iIpv6 inside of the CentOS guest
> >> >>>>>>>
> >> >>>>>>> Has anybody encountered this before?
> >> >>>>>>>
> >> >>>>>>> Lorin
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Lorin Hochstein
> >> >>>>>>> Lead Architect - Cloud Services
> >> >>>>>>> Nimbis Services, Inc.
> >> >>>>>>> www.nimbisservices.com
> >> >>>>>>>
> >> >>>>>>> _______________________________________________
> >> >>>>>>> OpenStack-operators mailing list
> >> >>>>>>> OpenStack-operators at lists.openstack.org
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Joe Topjian
> >> >>>>>> Systems Administrator
> >> >>>>>> Cybera Inc.
> >> >>>>>>
> >> >>>>>> www.cybera.ca
> >> >>>>>>
> >> >>>>>> Cybera is a not-for-profit organization that works to spur and
> >> >>>>>> support
> >> >>>>>> innovation, for the economic benefit of Alberta, through the use
> of
> >> >>>>>> cyberinfrastructure.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> Lorin Hochstein
> >> >>>>> Lead Architect - Cloud Services
> >> >>>>> Nimbis Services, Inc.
> >> >>>>> www.nimbisservices.com
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Joe Topjian
> >> >>>> Systems Administrator
> >> >>>> Cybera Inc.
> >> >>>>
> >> >>>> www.cybera.ca
> >> >>>>
> >> >>>> Cybera is a not-for-profit organization that works to spur and
> >> >>>> support
> >> >>>> innovation, for the economic benefit of Alberta, through the use of
> >> >>>> cyberinfrastructure.
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Lorin Hochstein
> >> >>> Lead Architect - Cloud Services
> >> >>> Nimbis Services, Inc.
> >> >>> www.nimbisservices.com
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Joe Topjian
> >> >> Systems Administrator
> >> >> Cybera Inc.
> >> >>
> >> >> www.cybera.ca
> >> >>
> >> >> Cybera is a not-for-profit organization that works to spur and
> support
> >> >> innovation, for the economic benefit of Alberta, through the use of
> >> >> cyberinfrastructure.
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Lorin Hochstein
> >> > Lead Architect - Cloud Services
> >> > Nimbis Services, Inc.
> >> > www.nimbisservices.com
> >> >
> >> > _______________________________________________
> >> > OpenStack-operators mailing list
> >> > OpenStack-operators at lists.openstack.org
> >> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >> >
> >
> >
> >
> >
> > --
> > Lorin Hochstein
> > Lead Architect - Cloud Services
> > Nimbis Services, Inc.
> > www.nimbisservices.com
>



-- 
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130405/d561def3/attachment.html>


More information about the OpenStack-operators mailing list