[openstack-dev] [openstack-ansible] L3HA problem

fabrice grelaud fabrice.grelaud at u-bordeaux.fr
Wed Jun 22 15:35:37 UTC 2016


> Le 22 juin 2016 à 15:45, Assaf Muller <assaf at redhat.com> a écrit :
> 
> On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
> <fabrice.grelaud at u-bordeaux.fr <mailto:fabrice.grelaud at u-bordeaux.fr>> wrote:
>> Hi,
>> 
>> we deployed our openstack infrastructure with your « exciting » project
>> openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA after
>> create router.
>> 
>> Our infra (closer to the doc):
>> 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
>> br-vlan))
>> 2 compute nodes (same for network)
>> 
>> We create an external network (vlan type), an internal network (vxlan type)
>> and a router connected to both networks.
>> And when we launch an instance (cirros), we can’t receive an ip on the vm.
>> 
>> We have:
>> 
>> root at p-osinfra03-utility-container-783041da:~# neutron
>> l3-agent-list-hosting-router router-bim
>> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+
>> | id                                   | host
>> | admin_state_up | alive | ha_state |
>> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+
>> | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
>> p-osinfra01-neutron-agents-container-f1ab9c14 | True           | :-)   |
>> active   |
>> | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
>> p-osinfra02-neutron-agents-container-48142ffe | True           | :-)   |
>> active   |
>> | 55350fac-16aa-488e-91fd-a7db38179c62 |
>> p-osinfra03-neutron-agents-container-2f6557f0 | True           | :-)   |
>> active   |
>> +--------------------------------------+-----------------------------------------------+----------------+-------+—————+
>> 
>> I know, i got a problem now because i should have :-) active, :-) standby,
>> :-) standby… Snif...
>> 
>> root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
>> qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
>> 
>> root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
>> default
>>    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>    inet 127.0.0.1/8 scope host lo
>>       valid_lft forever preferred_lft forever
>>    inet6 ::1/128 scope host
>>       valid_lft forever preferred_lft forever
>> 2: ha-4a5f0287-91 at if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>>    link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
>>    inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
>>       valid_lft forever preferred_lft forever
>>    inet 169.254.0.1/24 scope global ha-4a5f0287-91
>>       valid_lft forever preferred_lft forever
>>    inet6 fe80::f816:3eff:fec2:67a9/64 scope link
>>       valid_lft forever preferred_lft forever
>> 3: qr-44804d69-88 at if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
>> pfifo_fast state UP group default qlen 1000
>>    link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
>>    inet 192.168.100.254/24 scope global qr-44804d69-88
>>       valid_lft forever preferred_lft forever
>>    inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
>>       valid_lft forever preferred_lft forever
>> 4: qg-c5c7378e-1d at if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>> pfifo_fast state UP group default qlen 1000
>>    link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
>>    inet 147.210.240.11/23 scope global qg-c5c7378e-1d
>>       valid_lft forever preferred_lft forever
>>    inet 147.210.240.12/32 scope global qg-c5c7378e-1d
>>       valid_lft forever preferred_lft forever
>>    inet6 fe80::f816:3eff:feb6:4c97/64 scope link
>>       valid_lft forever preferred_lft forever
>> 
>> Same result on infra02 and infra03, qr and qg interfaces have the same ip,
>> and ha interfaces the address 169.254.0.1.
>> 
>> If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
>> restart the first (p-osinfra01), we can reboot the instance and we got an
>> ip, a floating ip and we can access by ssh from internet to the vm. (Note:
>> after few time, we loss our connectivity too).
>> 
>> But if we restart the two containers, we got a ha_state to « standby » until
>> the three become « active » and finally we have the problem again.
>> 
>> The three routers on infra 01/02/03 are seen as master.
>> 
>> If we ping from our instance to the router (internal network 192.168.100.4
>> to 192.168.100.254) we can see some ARP Request
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
>> 
>> And on the compute node we see all these frames on the various interfaces
>> tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing back.
>> 
>> We also have on ha interface, on each router, the VRRP communication
>> (heartbeat packets over a hidden project network that connects all ha
>> routers (vxlan 70) ) . Priori as normal, each router thinks to be master.
>> 
>> root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nl -i ha-4a5f0287-91
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on ha-4a5f0287-91, link-type EN10MB (Ethernet), capture size 65535
>> bytes
>> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> 
>> root at p-osinfra02-neutron-agents-container-48142ffe:~# ip netns exec
>> qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nt -i ha-4ee5f8d0-7f
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on ha-4ee5f8d0-7f, link-type EN10MB (Ethernet), capture size 65535
>> bytes
>> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
>> IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
>> authtype simple, intvl 2s, length 20
> 
> Are you seeing VRRP advertisements crossing nodes though? That tcpdump
> only shows advertisements from the local node. If nodes aren't
> receiving VRRP messages from other nodes, keepalived will declare
> itself as master (As expected). Can you ping the 'ha' interface from
> one router namespace to the other?

I stop the three neutron agent container.
Restart on infra01 then on infra02

I can see VRRP frames from infra01 (169.254.192.1 -> 224.0.0.18) and receive by infra02.

root at p-osinfra02:~# tcpdump -nl -i em2 | grep 169.254
tcpdump: WARNING: em2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em2, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
….
….
then i have
IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20

No more 169.254.192.1 from infra01 but the IP of HA interface from router on infra02.

And no more VRRP advertisements cross the nodes.
On each infra node, we see VRRP advertisements from the node itself but nothing from the other.

And otherwise, i can ping ha interface from one router namespace to the other:
root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ping 169.254.192.3
PING 169.254.192.3 (169.254.192.3) 56(84) bytes of data.
64 bytes from 169.254.192.3: icmp_seq=1 ttl=64 time=0.297 ms
64 bytes from 169.254.192.3: icmp_seq=2 ttl=64 time=0.239 ms
64 bytes from 169.254.192.3: icmp_seq=3 ttl=64 time=0.264 ms

im’ going to test with other version of keepalived (current version here 1.2.7-1 ubuntu 14.04).

Thanks to help

> 
>> 
>> 
>> Someone could tell me if he has already encountered this problem ?
>> The infra and compute nodes are connected to a nexus 9000 switch.
>> 
>> Thank you in advance for taking the time to study my request.
>> 
>> Fabrice Grelaud
>> Université de Bordeaux
>> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org <mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org <mailto:OpenStack-dev-request at lists.openstack.org>?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160622/5e4df88b/attachment.html>


More information about the OpenStack-dev mailing list