[Openstack-operators] Fwd: HAPROXY 504 errors in HA conf

Kris G. Lindgren klindgren at godaddy.com
Tue Jan 13 20:39:01 UTC 2015


It's been a while since I used keepalived.  However, can you confirm that on failover that the new master sends out a garp (gratuitous arp) for the VIP that it took over?  This garp should update the switches arp tables. (Which is what your outbound connection from a vm -> google is essentially doing  for you).

This should be configured by garp_master_delay.  The default if I remember right is 5 seconds.

Also, you should double check that you don't have port security enabled on the switch port.  At least on older cisco ios devices, if you had port security enabled to permit x number of dynamic mac-addresses, it kept track of the number off allowed mac addresses by adding static mac entries to the switch, which will completely screw up the type of clustering you are doing.
____________________________________________

Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.


From: Pedro Sousa <pgsousa at gmail.com<mailto:pgsousa at gmail.com>>
Date: Tuesday, January 13, 2015 at 1:18 PM
To: Jesse Keating <jlk at bluebox.net<mailto:jlk at bluebox.net>>
Cc: "OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>" <openstack-operators at lists.openstack.org<mailto:openstack-operators at lists.openstack.org>>
Subject: Re: [Openstack-operators] Fwd: HAPROXY 504 errors in HA conf

As expected If I reboot the Keepalived MASTER node, I get timeouts again, so my understanding is that this happens when the VIP fails over to another node. Anyone has explanation for this?

Thanks

On Tue, Jan 13, 2015 at 8:08 PM, Pedro Sousa <pgsousa at gmail.com<mailto:pgsousa at gmail.com>> wrote:
Hi,

I think I found out the issue, as I have all the 3 nodes running Keepalived as MASTER, when I reboot one of the servers, one of the VIPS failsover to it, causing the timeout issues. So I left only one server as MASTER and the other 2 as BACKUP, and If I reboot the BACKUP servers everything will work fine.

As a note aside, I don't know if this is some ARP issue because I have a similar problem with Neutron L3 running in HA Mode. If I reboot the server that is running as MASTER I loose connection to my floating IPS because the switch doesn't know yet that the Mac Addr has changed. To everything start working I have to ping an outside host  like google from an instance.

Maybe someone could share some experience on this,

Thank you for your help.




On Tue, Jan 13, 2015 at 7:18 PM, Pedro Sousa <pgsousa at gmail.com<mailto:pgsousa at gmail.com>> wrote:
Jesse,

I see a lot of these messages in glance-api:

2015-01-13 19:16:29.084 29269 DEBUG glance.api.middleware.version_negotiation [29d94a9a-135b-4bf2-a97b-f23b0704ee15 eb7ff2b5f0f34f51ac9ea0f75b60065d 2524b02b63994749ad1fed6f3a825c15 - - -] Unknown version. Returning version choices. process_request /usr/lib/python2.7/site-packages/glance/api/middleware/version_negotiation.py:64

While running openstack-status (glance image-list)

== Glance images ==
Error finding address for http://172.16.21.20:9292/v1/images/detail?sort_key=name&sort_dir=asc&limit=20: HTTPConnectionPool(host='172.16.21.20', port=9292): Max retries exceeded with url: /v1/images/detail?sort_key=name&sort_dir=asc&limit=20 (Caused by <class 'httplib.BadStatusLine'>: '')


Thanks


On Tue, Jan 13, 2015 at 6:52 PM, Jesse Keating <jlk at bluebox.net<mailto:jlk at bluebox.net>> wrote:
On 1/13/15 10:42 AM, Pedro Sousa wrote:
Hi


    I've changed some haproxy confs, now I'm getting a different error:

    *== Nova networks ==*
    *ERROR (ConnectionError): HTTPConnectionPool(host='172.16.21.20',
    port=8774): Max retries exceeded with url:
    /v2/2524b02b63994749ad1fed6f3a825c15/os-networks (Caused by <class
    'httplib.BadStatusLine'>: '')*
    *== Nova instance flavors ==*

If I restart my openstack services everything will start working.

    I'm attaching my new haproxy conf.


Thanks


Sounds like your services are losing access to something, like rabbit or the database. What do your service logs show prior to restart? Are they throwing any errors?


--
-jlk


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org<mailto:OpenStack-operators at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150113/ada73c95/attachment.html>


More information about the OpenStack-operators mailing list