[Openstack-operators] Neutron DVR HA

Pedro Sousa pgsousa at gmail.com
Tue Dec 30 18:56:45 UTC 2014


Hi,

as I stated, if I ping from an openstack instance the request appears on
*Compute01* and is *OK*:

*18:50:38.721115 IP 10.0.30.23 > 172.16.28.32 <http://172.16.28.32>: ICMP
echo request, id 29956, seq 36, length 64*
*18:50:38.721304 IP 172.16.28.32 > 10.0.30.23 <http://10.0.30.23>: ICMP
echo reply, id 29956, seq 36, length 64   *


if I ping outside from my outside network the request appears on *Compute02
*and is *NOT OK*:

*18:50:40.104025 ethertype IPv4, IP 192.168.8.4 > 172.16.28.32
<http://172.16.28.32>: ICMP echo request, id 13981, seq 425, length 64*
*18:50:40.104025 ethertype IPv4, IP 192.168.8.4 > 172.16.28.32
<http://172.16.28.32>: ICMP echo request, id 13981, seq 425, length 64*


I appreciate if someone can help me with this.

Thanks.







On Tue, Dec 30, 2014 at 3:17 PM, Pedro Sousa <pgsousa at gmail.com> wrote:

> Hi Assaf,
>
> another update, if I ping the floating ip from my instance it works. If I
> ping from outside/provider network, from my pc, it doesn't.
>
> Thanks
>
> On Tue, Dec 30, 2014 at 11:35 AM, Pedro Sousa <pgsousa at gmail.com> wrote:
>
>> Hi Assaf,
>>
>> According your instructions I can confirm that I have l2pop disabled.
>>
>> Meanwhile, I've made another test, yesterday when I left the office this
>> wasn't working, but when I arrived this morning it was pinging again, and I
>> didn't changed or touched anything. So my interpretation that this has some
>> sort of timeout issue.
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 30, 2014 at 11:27 AM, Assaf Muller <amuller at redhat.com>
>> wrote:
>>
>>> Sorry I can't open zip files on this email. You need l2pop to not exist
>>> in the ML2 mechanism drivers list in neutron.conf where the Neutron
>>> server
>>> is, and you need l2population = False in each OVS agent.
>>>
>>> ----- Original Message -----
>>> >
>>> > [Text File:warning1.txt]
>>> >
>>> > Hi Asaf,
>>> >
>>> > I think I disabled it, but maybe you can check my conf files? I've
>>> attached
>>> > the zip.
>>> >
>>> > Thanks
>>> >
>>> > On Tue, Dec 30, 2014 at 8:27 AM, Assaf Muller < amuller at redhat.com >
>>> wrote:
>>> >
>>> >
>>> >
>>> >
>>> > ----- Original Message -----
>>> > > Hi Britt,
>>> > >
>>> > > some update on this after running tcpdump:
>>> > >
>>> > > I have keepalived master running on controller01, If I reboot this
>>> server
>>> > > it
>>> > > failovers to controller02 which now becomes Keepalived Master, then
>>> I see
>>> > > ping packets arriving to controller02, this is good.
>>> > >
>>> > > However when the controller01 comes online I see that ping requests
>>> stop
>>> > > being forwarded to controller02 and start being sent to controller01
>>> that
>>> > > is
>>> > > now in Backup State, so it stops working.
>>> > >
>>> >
>>> > If traffic is being forwarded to a backup node, that sounds like L2pop
>>> is on.
>>> > Is that true by chance?
>>> >
>>> > > Any hint for this?
>>> > >
>>> > > Thanks
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Dec 29, 2014 at 11:06 AM, Pedro Sousa < pgsousa at gmail.com >
>>> wrote:
>>> > >
>>> > >
>>> > >
>>> > > Yes,
>>> > >
>>> > > I was using l2pop, disabled it, but the issue remains.
>>> > >
>>> > > I also stopped "bogus VRRP" messages configuring a user/password for
>>> > > keepalived, but when I reboot the servers, I see keepalived process
>>> running
>>> > > on them but I cannot ping the virtual router ip address anymore.
>>> > >
>>> > > So I rebooted the node that is running Keepalived as Master, starts
>>> pinging
>>> > > again, but when that node comes online, everything stops working.
>>> Anyone
>>> > > experienced this?
>>> > >
>>> > > Thanks
>>> > >
>>> > >
>>> > > On Tue, Dec 23, 2014 at 5:03 PM, David Martin < dmartls1 at gmail.com
>>> > wrote:
>>> > >
>>> > >
>>> > >
>>> > > Are you using l2pop? Until
>>> https://bugs.launchpad.net/neutron/+bug/1365476
>>> > > is
>>> > > fixed it's pretty broken.
>>> > >
>>> > > On Tue, Dec 23, 2014 at 10:48 AM, Britt Houser (bhouser) <
>>> > > bhouser at cisco.com
>>> > > > wrote:
>>> > >
>>> > >
>>> > >
>>> > > Unfortunately I've not had a chance yet to play with neutron router
>>> HA, so
>>> > > no
>>> > > hints from me. =( Can you give a little more details about "it stops
>>> > > working"? I.e. You see packets dropped while controller 1 is down? Do
>>> > > packets begin flowing before controller1 comes back online? Does
>>> > > controller1
>>> > > come back online successfully? Do packets begin to flow after
>>> controller1
>>> > > comes back online? Perhaps that will help.
>>> > >
>>> > > Thx,
>>> > > britt
>>> > >
>>> > > From: Pedro Sousa < pgsousa at gmail.com >
>>> > > Date: Tuesday, December 23, 2014 at 11:14 AM
>>> > > To: Britt Houser < bhouser at cisco.com >
>>> > > Cc: " OpenStack-operators at lists.openstack.org " <
>>> > > OpenStack-operators at lists.openstack.org >
>>> > > Subject: Re: [Openstack-operators] Neutron DVR HA
>>> > >
>>> > > I understand Britt, thanks.
>>> > >
>>> > > So I disabled DVR and tried to test L3_HA, but it's not working
>>> properly,
>>> > > it
>>> > > seems a keepalived issue. I see that it's running on 3 nodes:
>>> > >
>>> > > [root at controller01 keepalived]# neutron l3-agent-list-hosting-router
>>> > > harouter
>>> > >
>>> +--------------------------------------+--------------+----------------+-------+
>>> > > | id | host | admin_state_up | alive |
>>> > >
>>> +--------------------------------------+--------------+----------------+-------+
>>> > > | 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True | :-) |
>>> > > | 58ff7c42-7e71-4750-9f05-61ad5fbc5776 | compute03 | True | :-) |
>>> > > | 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02 | True | :-) |
>>> > >
>>> +--------------------------------------+--------------+----------------+-------+
>>> > >
>>> > > However if I reboot one of the l3-agent nodes it stops working. I
>>> see this
>>> > > in
>>> > > the logs:
>>> > >
>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: ip address
>>> associated
>>> > > with
>>> > > VRID not present in received packet : 172.16.28.20
>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: one or more VIP
>>> > > associated
>>> > > with VRID mismatch actual MASTER advert
>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: bogus VRRP packet
>>> > > received
>>> > > on ha-a509de81-1c !!!
>>> > > Dec 23 16:12:28 Compute02 Keepalived_vrrp[18928]: VRRP_Instance(VR_1)
>>> > > ignoring received advertisment...
>>> > >
>>> > > Dec 23 16:13:10 Compute03 Keepalived_vrrp[12501]: VRRP_Instance(VR_1)
>>> > > ignoring received advertisment...
>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: ip address
>>> associated
>>> > > with
>>> > > VRID not present in received packet : 172.16.28.20
>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: one or more VIP
>>> > > associated
>>> > > with VRID mismatch actual MASTER advert
>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: bogus VRRP packet
>>> > > received
>>> > > on ha-d5718741-ef !!!
>>> > > Dec 23 16:13:12 Compute03 Keepalived_vrrp[12501]: VRRP_Instance(VR_1)
>>> > > ignoring received advertisment...
>>> > >
>>> > > Any hint?
>>> > >
>>> > > Thanks
>>> > >
>>> > >
>>> > >
>>> > > On Tue, Dec 23, 2014 at 3:17 PM, Britt Houser (bhouser) <
>>> bhouser at cisco.com
>>> > > >
>>> > > wrote:
>>> > >
>>> > >
>>> > >
>>> > > Currently HA and DVR are mutually exclusive features.
>>> > >
>>> > > From: Pedro Sousa < pgsousa at gmail.com >
>>> > > Date: Tuesday, December 23, 2014 at 9:42 AM
>>> > > To: " OpenStack-operators at lists.openstack.org " <
>>> > > OpenStack-operators at lists.openstack.org >
>>> > > Subject: [Openstack-operators] Neutron DVR HA
>>> > >
>>> > > Hi all,
>>> > >
>>> > > I've been trying Neutron DVR with 2 controllers + 2 computes. When I
>>> create
>>> > > a
>>> > > router I can see that is running on all the servers:
>>> > >
>>> > > [root at controller01 ~]# neutron l3-agent-list-hosting-router router
>>> > >
>>> +--------------------------------------+--------------+----------------+-------+
>>> > > | id | host | admin_state_up | alive |
>>> > >
>>> +--------------------------------------+--------------+----------------+-------+
>>> > > | 09cfad44-2bb2-4683-a803-ed70f3a46a6a | controller01 | True | :-) |
>>> > > | 0ca01d56-b6dd-483d-9c49-cc7209da2a5a | controller02 | True | :-) |
>>> > > | 52379f0f-9046-4b73-9d87-bab7f96be5e7 | compute01 | True | :-) |
>>> > > | 8d778c6a-94df-40b7-a2d6-120668e699ca | compute02 | True | :-) |
>>> > >
>>> +--------------------------------------+--------------+----------------+-------+
>>> > >
>>> > > However if controller01 server dies I cannot ping ip external gateway
>>> > > anymore. Is this the expected behavior? Shouldn't it failback to the
>>> > > another
>>> > > controller node?
>>> > >
>>> > > Thanks
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > OpenStack-operators mailing list
>>> > > OpenStack-operators at lists.openstack.org
>>> > >
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > OpenStack-operators mailing list
>>> > > OpenStack-operators at lists.openstack.org
>>> > >
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>> > >
>>> >
>>> >
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20141230/1eda90f9/attachment.html>


More information about the OpenStack-operators mailing list