[openstack-dev] [openstack-ansible] L3HA problem

Anna Kamyshnikova akamyshnikova at mirantis.com
Thu Jun 23 07:44:03 UTC 2016


Version 1.2.13 is reliable.

On Wed, Jun 22, 2016 at 8:40 PM, Assaf Muller <assaf at redhat.com> wrote:

> On Wed, Jun 22, 2016 at 12:02 PM, fabrice grelaud
> <fabrice.grelaud at u-bordeaux.fr> wrote:
> >
> > Le 22 juin 2016 à 17:35, fabrice grelaud <fabrice.grelaud at u-bordeaux.fr>
> a
> > écrit :
> >
> >
> > Le 22 juin 2016 à 15:45, Assaf Muller <assaf at redhat.com> a écrit :
> >
> > On Wed, Jun 22, 2016 at 9:24 AM, fabrice grelaud
> > <fabrice.grelaud at u-bordeaux.fr> wrote:
> >
> > Hi,
> >
> > we deployed our openstack infrastructure with your « exciting » project
> > openstack-ansible (mitaka 13.1.2) but we have some problems with L3HA
> after
> > create router.
> >
> > Our infra (closer to the doc):
> > 3 controllers nodes (with bond0 (br-mgmt, br-storage), bond1 (br-vxlan,
> > br-vlan))
> > 2 compute nodes (same for network)
> >
> > We create an external network (vlan type), an internal network (vxlan
> type)
> > and a router connected to both networks.
> > And when we launch an instance (cirros), we can’t receive an ip on the
> vm.
> >
> > We have:
> >
> > root at p-osinfra03-utility-container-783041da:~# neutron
> > l3-agent-list-hosting-router router-bim
> >
> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+
> > | id                                   | host
> > | admin_state_up | alive | ha_state |
> >
> +--------------------------------------+-----------------------------------------------+----------------+-------+----------+
> > | 3c7918e5-3ad6-4f82-a81b-700790e3c016 |
> > p-osinfra01-neutron-agents-container-f1ab9c14 | True           | :-)   |
> > active   |
> > | f2bf385a-f210-4dbc-8d7d-4b7b845c09b0 |
> > p-osinfra02-neutron-agents-container-48142ffe | True           | :-)   |
> > active   |
> > | 55350fac-16aa-488e-91fd-a7db38179c62 |
> > p-osinfra03-neutron-agents-container-2f6557f0 | True           | :-)   |
> > active   |
> >
> +--------------------------------------+-----------------------------------------------+----------------+-------+—————+
> >
> > I know, i got a problem now because i should have :-) active, :-)
> standby,
> > :-) standby… Snif...
> >
> > root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6
> > qdhcp-0ba266fb-15c4-4566-ae88-92d4c8fd2036
> >
> > root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ip a sh
> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group
> > default
> >    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >    inet 127.0.0.1/8 scope host lo
> >       valid_lft forever preferred_lft forever
> >    inet6 ::1/128 scope host
> >       valid_lft forever preferred_lft forever
> > 2: ha-4a5f0287-91 at if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
> > pfifo_fast state UP group default qlen 1000
> >    link/ether fa:16:3e:c2:67:a9 brd ff:ff:ff:ff:ff:ff
> >    inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-4a5f0287-91
> >       valid_lft forever preferred_lft forever
> >    inet 169.254.0.1/24 scope global ha-4a5f0287-91
> >       valid_lft forever preferred_lft forever
> >    inet6 fe80::f816:3eff:fec2:67a9/64 scope link
> >       valid_lft forever preferred_lft forever
> > 3: qr-44804d69-88 at if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
> > pfifo_fast state UP group default qlen 1000
> >    link/ether fa:16:3e:a5:8c:f2 brd ff:ff:ff:ff:ff:ff
> >    inet 192.168.100.254/24 scope global qr-44804d69-88
> >       valid_lft forever preferred_lft forever
> >    inet6 fe80::f816:3eff:fea5:8cf2/64 scope link
> >       valid_lft forever preferred_lft forever
> > 4: qg-c5c7378e-1d at if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> > pfifo_fast state UP group default qlen 1000
> >    link/ether fa:16:3e:b6:4c:97 brd ff:ff:ff:ff:ff:ff
> >    inet 147.210.240.11/23 scope global qg-c5c7378e-1d
> >       valid_lft forever preferred_lft forever
> >    inet 147.210.240.12/32 scope global qg-c5c7378e-1d
> >       valid_lft forever preferred_lft forever
> >    inet6 fe80::f816:3eff:feb6:4c97/64 scope link
> >       valid_lft forever preferred_lft forever
> >
> > Same result on infra02 and infra03, qr and qg interfaces have the same
> ip,
> > and ha interfaces the address 169.254.0.1.
> >
> > If we stop 2 neutron agent containers (p-osinfra02, p-osinfra03) and we
> > restart the first (p-osinfra01), we can reboot the instance and we got an
> > ip, a floating ip and we can access by ssh from internet to the vm.
> (Note:
> > after few time, we loss our connectivity too).
> >
> > But if we restart the two containers, we got a ha_state to « standby »
> until
> > the three become « active » and finally we have the problem again.
> >
> > The three routers on infra 01/02/03 are seen as master.
> >
> > If we ping from our instance to the router (internal network
> 192.168.100.4
> > to 192.168.100.254) we can see some ARP Request
> > ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> > ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> > ARP, Request who-has 192.168.100.254 tell 192.168.100.4, length 28
> >
> > And on the compute node we see all these frames on the various interfaces
> > tap / vxlan-89 / br-vxlan / bond1.vxlanvlan / bond1 / em2 but nothing
> back.
> >
> > We also have on ha interface, on each router, the VRRP communication
> > (heartbeat packets over a hidden project network that connects all ha
> > routers (vxlan 70) ) . Priori as normal, each router thinks to be master.
> >
> > root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nl -i
> ha-4a5f0287-91
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> > listening on ha-4a5f0287-91, link-type EN10MB (Ethernet), capture size
> 65535
> > bytes
> > IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> >
> > root at p-osinfra02-neutron-agents-container-48142ffe:~# ip netns exec
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 tcpdump -nt -i
> ha-4ee5f8d0-7f
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> > listening on ha-4ee5f8d0-7f, link-type EN10MB (Ethernet), capture size
> 65535
> > bytes
> > IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> >
> >
> > Are you seeing VRRP advertisements crossing nodes though? That tcpdump
> > only shows advertisements from the local node. If nodes aren't
> > receiving VRRP messages from other nodes, keepalived will declare
> > itself as master (As expected). Can you ping the 'ha' interface from
> > one router namespace to the other?
> >
> >
> > I stop the three neutron agent container.
> > Restart on infra01 then on infra02
> >
> > I can see VRRP frames from infra01 (169.254.192.1 -> 224.0.0.18) and
> receive
> > by infra02.
> >
> > root at p-osinfra02:~# tcpdump -nl -i em2 | grep 169.254
> > tcpdump: WARNING: em2: no IPv4 address assigned
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> decode
> > listening on em2, link-type EN10MB (Ethernet), capture size 65535 bytes
> > IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> > ….
> > ….
> > then i have
> > IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50,
> > authtype simple, intvl 2s, length 20
> >
> > No more 169.254.192.1 from infra01 but the IP of HA interface from
> router on
> > infra02.
> >
> > And no more VRRP advertisements cross the nodes.
> > On each infra node, we see VRRP advertisements from the node itself but
> > nothing from the other.
> >
> > And otherwise, i can ping ha interface from one router namespace to the
> > other:
> > root at p-osinfra01-neutron-agents-container-f1ab9c14:~# ip netns exec
> > qrouter-eeb2147a-5cc6-4b5e-b97c-07cfc141e8e6 ping 169.254.192.3
> > PING 169.254.192.3 (169.254.192.3) 56(84) bytes of data.
> > 64 bytes from 169.254.192.3: icmp_seq=1 ttl=64 time=0.297 ms
> > 64 bytes from 169.254.192.3: icmp_seq=2 ttl=64 time=0.239 ms
> > 64 bytes from 169.254.192.3: icmp_seq=3 ttl=64 time=0.264 ms
> >
> > im’ going to test with other version of keepalived (current version here
> > 1.2.7-1 ubuntu 14.04).
> >
> > Thanks to help
> >
> >
> > Note:
> > I said i can ping between ha interface but not for long time. At one
> point,
> > i can’t anymore… :-(
>
> That's the problem. This becomes a normal Neutron troubleshooting: Why
> can't one port ping the other? This might help:
> https://assafmuller.com/2015/08/31/neutron-troubleshooting/
>
> >
> >
> >
> >
> >
> > Someone could tell me if he has already encountered this problem ?
> > The infra and compute nodes are connected to a nexus 9000 switch.
> >
> > Thank you in advance for taking the time to study my request.
> >
> > Fabrice Grelaud
> > Université de Bordeaux
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Regards,
Ann Kamyshnikova
Mirantis, Inc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160623/b65646ff/attachment.html>


More information about the OpenStack-dev mailing list