[Openstack-operators] Neutron DVR HA

Pedro Sousa pgsousa at gmail.com
Wed Jan 7 19:49:24 UTC 2015


Hi all,

after some more tests it seems some gratuitous arp issue because if I start
a new connection (ping) from an inside instance to an external host like
google it will work.

This means that instance advertises the switch that in fact  something has
changed and it should update arp table.

Anyone has seen this behavior?

Thanks



On Tue, Dec 30, 2014 at 6:56 PM, Pedro Sousa <pgsousa at gmail.com> wrote:

> Hi,
>
> as I stated, if I ping from an openstack instance the request appears on
> *Compute01* and is *OK*:
>
> *18:50:38.721115 IP 10.0.30.23 > 172.16.28.32 <http://172.16.28.32>: ICMP
> echo request, id 29956, seq 36, length 64*
> *18:50:38.721304 IP 172.16.28.32 > 10.0.30.23 <http://10.0.30.23>: ICMP
> echo reply, id 29956, seq 36, length 64   *
>
>
> if I ping outside from my outside network the request appears on *Compute02
> *and is *NOT OK*:
>
> *18:50:40.104025 ethertype IPv4, IP 192.168.8.4 > 172.16.28.32
> <http://172.16.28.32>: ICMP echo request, id 13981, seq 425, length 64*
> *18:50:40.104025 ethertype IPv4, IP 192.168.8.4 > 172.16.28.32
> <http://172.16.28.32>: ICMP echo request, id 13981, seq 425, length 64*
>
>
> I appreciate if someone can help me with this.
>
> Thanks.
>
>
>
>
>
>
>
> On Tue, Dec 30, 2014 at 3:17 PM, Pedro Sousa <pgsousa at gmail.com> wrote:
>
>> Hi Assaf,
>>
>> another update, if I ping the floating ip from my instance it works. If I
>> ping from outside/provider network, from my pc, it doesn't.
>>
>> Thanks
>>
>> On Tue, Dec 30, 2014 at 11:35 AM, Pedro Sousa <pgsousa at gmail.com> wrote:
>>
>>> Hi Assaf,
>>>
>>> According your instructions I can confirm that I have l2pop disabled.
>>>
>>> Meanwhile, I've made another test, yesterday when I left the office this
>>> wasn't working, but when I arrived this morning it was pinging again, and I
>>> didn't changed or touched anything. So my interpretation that this has some
>>> sort of timeout issue.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 30, 2014 at 11:27 AM, Assaf Muller <amuller at redhat.com>
>>> wrote:
>>>
>>>> Sorry I can't open zip files on this email. You need l2pop to not exist
>>>> in the ML2 mechanism drivers list in neutron.conf where the Neutron
>>>> server
>>>> is, and you need l2population = False in each OVS agent.
>>>>
>>>> ----- Original Message -----
>>>> >
>>>> > [Text File:warning1.txt]
>>>> >
>>>> > Hi Asaf,
>>>> >
>>>> > I think I disabled it, but maybe you can check my conf files? I've
>>>> attached
>>>> > the zip.
>>>> >
>>>> > Thanks
>>>> >
>>>> > On Tue, Dec 30, 2014 at 8:27 AM, Assaf Muller < amuller at redhat.com >
>>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > ----- Original Message -----
>>>> > > Hi Britt,
>>>> > >
>>>> > > some update on this after running tcpdump:
>>>> > >
>>>> > > I have keepalived master running on controller01, If I reboot this
>>>> server
>>>> > > it
>>>> > > failovers to controller02 which now becomes Keepalived Master, then
>>>> I see
>>>> > > ping packets arriving to controller02, this is good.
>>>> > >
>>>> > > However when the controller01 comes online I see that ping requests
>>>> stop
>>>> > > being forwarded to controller02 and start being sent to
>>>> controller01 that
>>>> > > is
>>>> > > now in Backup State, so it stops working.
>>>> > >
>>>> >
>>>> > If traffic is being forwarded to a backup node, that sounds like
>>>> L2pop is on.
>>>> > Is that true by chance?
>>>> >
>>>> > > Any hint for this?
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Dec 29, 2014 at 11:06 AM, Pedro Sousa < pgsousa at gmail.com
>>>> > wrote:
>>>> > >
>>>> > >
>>>> > >
>>>> > > Yes,
>>>> > >
>>>> > > I was using l2pop, disabled it, but the issue remains.
>>>> > >
>>>> > > I also stopped "bogus VRRP" messages configuring a user/password for
>>>> > > keepalived, but when I reboot the servers, I see keepalived process
>>>> running
>>>> > > on them but I cannot ping the virtual router ip address anymore.
>>>> > >
>>>> > > So I rebooted the node that is running Keepalived as Master, starts
>>>> pinging
>>>> > > again, but when that node comes online, everything stops working.
>>>> Anyone
>>>> > > experienced this?
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > >
>>>> > > On Tue, Dec 23, 2014 at 5:03 PM, David Martin < dmartls1 at gmail.com
>>>> > wrote:
>>>> > >
>>>> > >
>>>> > >
>>>> > > Are you using l2pop? Until
>>>> https://bugs.launchpad.net/neutron/+bug/1365476
>>>> > > is
>>>> > > fixed it's pretty broken.
>>>> > >
>>>> > > On Tue, Dec 23, 2014 at 10:48 AM, Britt Houser (bhouser) <
>>>> > > bhouser at cisco.com
>>>> > > > wrote:
>>>> > >
>>>> > >
>>>> > >
>>>> > > Unfortunately I've not had a chance yet to play with neutron router
>>>> HA, so
>>>> > > no
>>>> > > hints from me. =( Can you give a little more details about "it stops
>>>> > > working"? I.e. You see packets dropped while controller 1 is down?
>>>> Do
>>>> > > packets begin flowing before controller1 comes back online? Does
>>>> > > controller1
>>>> > > come back online successfully? Do packets begin to flow after
>>>> controller1
>>>> > > comes back online? Perhaps that will help.
>>>> > >
>>>> > > Thx,
>>>> > > britt
>>>> > >
>>>> > > From: Pedro Sousa < pgsousa at gmail.com >
>>>> > > Date: Tuesday, December 23, 2014 at 11:14 AM
>>>> > > To: Britt Houser < bhouser at cisco.com >
>>>> > > Cc: " OpenStack-operators at lists.openstack.org " <
>>>> > > OpenStack-operators at lists.openstack.org >
>>>> > > Subject: Re: [Openstack-operators] Neutron DVR HA
>>>> > >
>>>> > > I understand Britt, thanks.
>>>> > >
>>>> > > So I disabled DVR and tried to test L3_HA, but it's not working
>>>> properly,
>>>> > > it
>>>> > > seems a keepalived issue. I see that it's running on 3 nodes:
>>>> > >
>>>> > > [root at controller01 keepalived]# neutron
>>>> l3-agent-list-hosting-router
>>>> > > harouter
>>>> > >
>>>> +--------------------------------------+--------------+----------------+-------+
>>>> > > | id | host | admin_state_up | alive |
>>>> > >
>>>> +--------------------------------------+--------------+----------------+-------+
>>>> > > | 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True | :-) |
>>>> > > | 58ff7c42-7e71-4750-9f05-61ad5fbc5776 | compute03 | True | :-) |
>>>> > > | 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02 | True | :-) |
>>>> > >
>>>> +--------------------------------------+--------------+----------------+-------+
>>>> > >
>>>> > > However if I reboot one of the l3-agent nodes it stops working. I
>>>> see this
>>>> > > in
>>>> > > the logs:
>>>> > >
>>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: ip address
>>>> associated
>>>> > > with
>>>> > > VRID not present in received packet : 172.16.28.20
>>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: one or more VIP
>>>> > > associated
>>>> > > with VRID mismatch actual MASTER advert
>>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: bogus VRRP packet
>>>> > > received
>>>> > > on ha-a509de81-1c !!!
>>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]:
>>>> VRRP_Instance(VR_1)
>>>> > > ignoring received advertisment...
>>>> > >
>>>> > > Dec 23 16:13:10 Compute03 Keepalived_vrrp[12501]:
>>>> VRRP_Instance(VR_1)
>>>> > > ignoring received advertisment...
>>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: ip address
>>>> associated
>>>> > > with
>>>> > > VRID not present in received packet : 172.16.28.20
>>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: one or more VIP
>>>> > > associated
>>>> > > with VRID mismatch actual MASTER advert
>>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: bogus VRRP packet
>>>> > > received
>>>> > > on ha-d5718741-ef !!!
>>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]:
>>>> VRRP_Instance(VR_1)
>>>> > > ignoring received advertisment...
>>>> > >
>>>> > > Any hint?
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tue, Dec 23, 2014 at 3:17 PM, Britt Houser (bhouser) <
>>>> bhouser at cisco.com
>>>> > > >
>>>> > > wrote:
>>>> > >
>>>> > >
>>>> > >
>>>> > > Currently HA and DVR are mutually exclusive features.
>>>> > >
>>>> > > From: Pedro Sousa < pgsousa at gmail.com >
>>>> > > Date: Tuesday, December 23, 2014 at 9:42 AM
>>>> > > To: " OpenStack-operators at lists.openstack.org " <
>>>> > > OpenStack-operators at lists.openstack.org >
>>>> > > Subject: [Openstack-operators] Neutron DVR HA
>>>> > >
>>>> > > Hi all,
>>>> > >
>>>> > > I've been trying Neutron DVR with 2 controllers + 2 computes. When
>>>> I create
>>>> > > a
>>>> > > router I can see that is running on all the servers:
>>>> > >
>>>> > > [root at controller01 ~]# neutron l3-agent-list-hosting-router router
>>>> > >
>>>> +--------------------------------------+--------------+----------------+-------+
>>>> > > | id | host | admin_state_up | alive |
>>>> > >
>>>> +--------------------------------------+--------------+----------------+-------+
>>>> > > | 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True | :-) |
>>>> > > | 0ca01d56-b6dd-483d-9c49-cc7209da2a5a | controller02 | True | :-) |
>>>> > > | 52379f0f-9046-4b73-9d87-bab7f96be5e7 | compute01 | True | :-) |
>>>> > > | 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02 | True | :-) |
>>>> > >
>>>> +--------------------------------------+--------------+----------------+-------+
>>>> > >
>>>> > > However if controller01 server dies I cannot ping ip external
>>>> gateway
>>>> > > anymore. Is this the expected behavior? Shouldn't it failback to the
>>>> > > another
>>>> > > controller node?
>>>> > >
>>>> > > Thanks
>>>> > >
>>>> > >
>>>> > > _______________________________________________
>>>> > > OpenStack-operators mailing list
>>>> > > OpenStack-operators at lists.openstack.org
>>>> > >
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > _______________________________________________
>>>> > > OpenStack-operators mailing list
>>>> > > OpenStack-operators at lists.openstack.org
>>>> > >
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>> > >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150107/7c4f2f87/attachment.html>


More information about the OpenStack-operators mailing list