[Openstack] A Grizzly GRE failure

Greg Chavez greg.chavez at gmail.com
Sat May 11 18:28:54 UTC 2013


So to be clear:

* I have a three nics on my network node.  The VM traffic goes out the
1st nic on 192.168.239.99/24 to the other compute nodes, while
management traffic goes out the 2nd nic on 192.168.241.99. The 3rd nic
is external and has no IP.

* I have four GRE endpoints on the VM network, one at the network node
(192.168.239.99) and three on compute nodes
(192.168.239.{110,114,115}), all with IDs 2-5.

* I have a fifth GRE endpoint with id 1 to 192.168.241.99, the network
node's management interface.  This was the first tunnel created when I
deployed the network node because that is how I set the remote_ip in
the ovs plugin ini.  I corrected the setting later, but the
192.168.241.99 endpoint persists and,  as your response implies, *this
extraneous endpoint is the cause of my troubles*.

My next question then is what is happening? My guess:

* I ping a guest from the external network using its floater (10.21.166.4).

* It gets NAT'd at the tenant router on the network node to
192.168.252.3, at which point an arp request is sent over the unified
GRE broadcast domain.

* On a compute node, the arp request is received by the VM, which then
sends a reply to the tenant router's MAC (which I verified with
tcpdumps).

* There are four endpoints for the packet to go down:

    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="192.168.241.99"}
        Port "gre-4"
            Interface "gre-4"
                type: gre
                options: {in_key=flow, out_key=flow,
remote_ip="192.168.239.114"}
        Port "gre-3"
            Interface "gre-3"
                type: gre
                options: {in_key=flow, out_key=flow,
remote_ip="192.168.239.110"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-2"
            Interface "gre-2"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="192.168.239.99"}

Here's where I get confused.  Does it know that gre-1 is a different
broadcast domain than the others, or does is see all endpoints as the
same domain?

What happens here?  Is this the cause of my network timeouts on
external connections to the VMs? Does this also explain the sporadic
nature of the timeouts, why they aren't consistent in frequency or
duration?

Finally, what happens when I remove the oddball endpoint from the DB?
Sounds risky!

Thanks for your help
--Greg Chavez

On Fri, May 10, 2013 at 7:17 PM, Darragh O'Reilly
<dara2002-openstack at yahoo.com> wrote:
> I'm not sure how to rectify that. You may have to delete the bad row from the DB and restart the agents:
>
> mysql> use quantum;
> mysql> select * from ovs_tunnel_endpoints;
> ...
>
>On Fri, May 10, 2013 at 6:43 PM, Greg Chavez <greg.chavez at gmail.com> wrote:
>>  I'm refactoring my question once again (see  "A Grizzly arping
>>  failure" and "Failure to arp by quantum router").
>>
>>  Quickly, the problem is in a multi-node Grizzly+Raring setup with a
>>  separate network node and a dedicated VLAN for VM traffic.  External
>>  connections time out within a minute and dont' resume until traffic is
>>  initiated from the VM.
>>
>>  I got some rather annoying and hostile assistance just now on IRC and
>>  while it didn't result in a fix, it got me to realize that the problem
>>  is possibly with my GRE setup.
>>
>>  I made a mistake when I originally set this up, assigning the mgmt
>>  interface of the network node (192.168.241.99) as its GRE remote_ip
>>  instead if the vm_config network interface (192.168.239.99).  I
>>  realized my mistake and reconfigured the OVS plugin on the network
>>  node and moved one.  But now, taking a look at my OVS bridges on the
>>  network node, I see that the old remote IP is still there!
>>
>>      Bridge br-tun
>>  <snip>
>>          Port "gre-1"
>>              Interface "gre-1"
>>                  type: gre
>>                  options: {in_key=flow, out_key=flow, remote_ip="192.168.241.99"}
>>  <snip>
>>
>>  This is also on all the compute nodes.
>>
>>  ( Full ovs-vsctl show output here: http://pastebin.com/xbre1fNV)
>>
>>  What's more, I have this error every time I restart OVS:
>>
>>  2013-05-10 18:21:24    ERROR [quantum.agent.linux.ovs_lib] Unable to
>>  execute ['ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5'].
>>  Exception:
>>  Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf',
>>  'ovs-vsctl', '--timeout=2', 'add-port', 'br-tun', 'gre-5']
>>  Exit code: 1
>>  Stdout: ''
>>  Stderr: 'ovs-vsctl: cannot create a port named gre-5 because a port
>>  named gre-5 already exists on bridge br-tun\n'
>>
>>  Could that be because grep-1 is vestigial and possibly fouling up the
>>  works by creating two possible paths for VM traffic?
>>
>>  Is it as simple as removing it with ovs-vsctl or is something else required?
>>
>>  Or is this actually needed for some reason?  Argh... help!




More information about the Openstack mailing list