[openstack-dev] [openstack-ansible] L3HA problem
Assaf Muller
assaf at redhat.com
Wed Jun 22 13:45:12 UTC 2016
On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
<fabrice.grelaud at u-bordeaux.fr> wrote:
> Hi,
>
> we deployed our openstack infrastructure with your « exciting » project
> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
> create router.
>
> Our infra (closer to the doc):
> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
> br-vlan))
> 2 compute nodes (same for network)
>
> We create an external network (vlan type), an internal network (vxlan type)
> and a router connected to both networks.
> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>
> We have:
>
> root at p-osinfra03-utility-container-783041da:~# neutron
> l3-agent-list-hosting-router router-bim
> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+
> | id | host
> | admin_state_up | alive | ha_state |
> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+
> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
> p-osinfra01-neutron-agents-container-f1ab9c14 | True | :-) |
> active |
> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
> p-osinfra02-neutron-agents-container-48142ffe | True | :-) |
> active |
> | 55350fac-16aa-488e-91fd-a7db38179c62 |
> p-osinfra03-neutron-agents-container-2f6557f0 | True | :-) |
> active |
> +--------------------------------------+-----------------------------------------------+----------------+-------+—————+
>
> I know, i got a problem now because i should have :-) active, :-) standby,
> :-) standby… Snif...
>
> root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>
> root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
> default
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 scope host lo
> valid_lft forever preferred_lft forever
> inet6 ::1/128 scope host
> valid_lft forever preferred_lft forever
> 2: ha-4a5f0287-91 at if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
> inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
> valid_lft forever preferred_lft forever
> inet 169.254.0.1/24 scope global ha-4a5f0287-91
> valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fec2:67a9/64 scope link
> valid_lft forever preferred_lft forever
> 3: qr-44804d69-88 at if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
> inet 192.168.100.254/24 scope global qr-44804d69-88
> valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
> valid_lft forever preferred_lft forever
> 4: qg-c5c7378e-1d at if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast state UP group default qlen 1000
> link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
> inet 147.210.240.11/23 scope global qg-c5c7378e-1d
> valid_lft forever preferred_lft forever
> inet 147.210.240.12/32 scope global qg-c5c7378e-1d
> valid_lft forever preferred_lft forever
> inet6 fe80::f816:3eff:feb6:4c97/64 scope link
> valid_lft forever preferred_lft forever
>
> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
> and ha interfaces the address 169.254.0.1.
>
> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
> restart the first (p-osinfra01), we can reboot the instance and we got an
> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
> after few time, we loss our connectivity too).
>
> But if we restart the two containers, we got a ha_state to « standby » until
> the three become « active » and finally we have the problem again.
>
> The three routers on infra 01/02/03 are seen as master.
>
> If we ping from our instance to the router (internal network 192.168.100.4
> to 192.168.100.254) we can see some ARP Request
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>
> And on the compute node we see all these frames on the various interfaces
> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>
> We also have on ha interface, on each router, the VRRP communication
> (heartbeat packets over a hidden project network that connects all ha
> routers (vxlan 70) ) . Priori as normal, each router thinks to be master.
>
> root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nl -i ha-4a5f0287-91
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on ha-4a5f0287-91, link-type EN10MB (Ethernet), capture size 65535
> bytes
> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
>
> root at p-osinfra02-neutron-agents-container-48142ffe:~# ip netns exec
> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nt -i ha-4ee5f8d0-7f
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on ha-4ee5f8d0-7f, link-type EN10MB (Ethernet), capture size 65535
> bytes
> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> authtype simple, intvl 2s, length 20
Are you seeing VRRP advertisements crossing nodes though? That tcpdump
only shows advertisements from the local node. If nodes aren't
receiving VRRP messages from other nodes, keepalived will declare
itself as master (As expected). Can you ping the 'ha' interface from
one router namespace to the other?
>
>
> Someone could tell me if he has already encountered this problem ?
> The infra and compute nodes are connected to a nexus 9000 switch.
>
> Thank you in advance for taking the time to study my request.
>
> Fabrice Grelaud
> Université de Bordeaux
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list