Open Stack

Sat Oct 12 12:59:40 UTC 2013

Hi,

I have following setup:

1) infrastructure node, IP in bond, hosting following KVM guests:
1.1) Postgres KVM guest
1.2) MQ KVM guest
1.3) DNS KVM guest
1.4) Control node with Nova API, Cinder API, Quantum Server, etc.
...
1.8) Quantum network node with quantum agents

Agents on this network node are always dying and starting up again:

# quantum agent-list
+--------------------------------------+--------------------+-----------------------------+-------+----------------+
| id                                   | agent_type         | host                        | alive | admin_state_up |
+--------------------------------------+--------------------+-----------------------------+-------+----------------+
| 5656392b-b6fe-4570-802f-97d2154acf31 | L3 agent           | net01-001.int.net.net | xxx   | True           |
| 1093fb73-6622-448e-8dad-558a36cca306 | DHCP agent         | net01-001.int.net.net | xxx   | True           |
| 4518830d-e112-439f-a629-7defa7bd29e9 | Open vSwitch agent | net01-001.int.net.net | xxx   | True           |
| 86ee6d24-2e6a-4f58-addb-290fefc26401 | Open vSwitch agent | nova05                      | :-)   | True           |
| b67697bb-3ec1-49fc-8f3c-7e4e7892e83a | Open vSwitch agent | nova04                      | :-)   | True           |
+--------------------------------------+--------------------+-----------------------------+-------+----------------+

Few minutes after, those agents will be up again, one may die - while others not. 

ping net01-001
PING net01-001.int.net.net (10.10.146.34) 56(84) bytes of data.
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=1 ttl=64 time=0.912 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=2 ttl=64 time=0.273 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=2 ttl=64 time=0.319 ms (DUP!)
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=3 ttl=64 time=0.190 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=4 ttl=64 time=0.230 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=4 ttl=64 time=0.305 ms (DUP!)
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=5 ttl=64 time=0.199 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=7 ttl=64 time=0.211 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=8 ttl=64 time=0.322 ms
64 bytes from net01-001.int.net.net (10.10.146.34): icmp_req=8 ttl=64 time=0.409 ms (DUP!)
^C
--- net01-001.int.net.net ping statistics ---
8 packets transmitted, 7 received, +3 duplicates, 12% packet loss, time 7017ms

SSH`ing to network node is also difficult - constant freezes. Nothing suspicious in the logs.

Since DHCP agent may be down, spawning a VM may end in "waiting for network device" state. Then, it might get the internal IP and then floating - but accessing it also proves to be very troublesome - I believe because of L3 agent flapping.

My OpenStack was set up under this manual - https://github.com/mseknibilel/OpenStack-Grizzly-Install-Guide/blob/OVS_MultiNode/OpenStack_Grizzly_Install_Guide.rst

Only thing I did - I added HAproxy/keepalived on top of it, balancing API requests on control nodes. But this shouldn`t impact networking...

Anyone have any thoughts about this?

Cheers,
NM

Open Stack

[Openstack] quantum network node in KVM guest - connectivity issues

OpenStack

Community

Documentation

Branding & Legal