[openstack-dev] [octavia] Sometimes amphoras are not re-created if they are not reached for more than heartbeat_timeout

Michael Johnson johnsomor at gmail.com
Fri Apr 27 17:24:27 UTC 2018


Hi Mihaela,

I am sorry to hear you are having trouble with the queens release of
Octavia.  It is true that a lot of work has gone into the failover
capability, specifically working around a python threading issue and
making it more resistant to certain neutron failure situations
(missing ports, etc.).

I know of one open bug against the failover flows,
https://storyboard.openstack.org/#!/story/2001481, "failover breaks in
Active/Standby mode if both amphroae are down".

Unfortunately the log snippet above does not give me enough
information about the problem to help with this issue. From the
snippet it looks like the failovers were initiated, but the
controllers are unable to reach the amphora-agent on the replacement
amphora. It will continue those retry attempts, but eventually will
fail the amphora into ERROR if it doesn't succeed.

One thought I have is if you created you amphora image in the last two
weeks, you may have built an amphora using the master branch of
octavia, which had a bug that impacted active/standby images. This was
introduced working around the new pip 10 issues.  That patch has been
fixed: https://review.openstack.org/#/c/564371/

If neither of these situations match your environment, please open a
story (https://storyboard.openstack.org/#!/dashboard/stories) for us
and include the health manager logs from the point you delete the
amphora up until it starts these connection attempts.  We will dig
through those logs to see what the issue might be.

Michael (johnsom)

On Wed, Apr 25, 2018 at 4:07 AM,  <mihaela.balas at orange.com> wrote:
> Hello,
>
>
>
> I am testing Octavia Queens and I see that the failover behavior is very
> much different than the one in Ocata (this is the version we are currently
> running in production).
>
> One example of such behavior is:
>
>
>
> I create 4 load balancers and after the creation is successful, I shut off
> all the 8 amphoras. Sometimes, even the health-manager agent does not reach
> the amphoras, they are not deleted and re-created. The logs look like shown
> below even when the heartbeat timeout is long passed. Sometimes the amphoras
> are deleted and re-created. Sometimes,  they are partially re-created – part
> of them remain in shut off.
>
> Heartbeat_timeout is set to 60 seconds.
>
>
>
>
>
>
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:26.244 11 WARNING
> octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-339b54a7-ab0c-422a-832f-a444cd710497 - a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries exceeded
> with url:
> /0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octaviasrv2.orange.com.pem
> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f559862c710>: Failed to establish a new connection: [Errno 113]
> No route to host',))
>
> [octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:26.464 13 WARNING
> octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries exceeded
> with url:
> /0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8-9d73-2397e281712c/haproxy
> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f8a0de95e10>: Failed to establish a new connection: [Errno 113]
> No route to host',))
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:27.772 11 WARNING
> octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-10febb10-85ea-4082-9df7-daa48894b004 - a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries exceeded
> with url:
> /0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b-b93f-1da3957a5b71/haproxy
> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f5598491c90>: Failed to establish a new connection: [Errno 113]
> No route to host',))
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:34.252 11 WARNING
> octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-339b54a7-ab0c-422a-832f-a444cd710497 - a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries exceeded
> with url:
> /0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octaviasrv2.orange.com.pem
> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f5598520790>: Failed to establish a new connection: [Errno 113]
> No route to host',))
>
> [octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:34.476 13 WARNING
> octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries exceeded
> with url:
> /0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8-9d73-2397e281712c/haproxy
> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f8a0de953d0>: Failed to establish a new connection: [Errno 113]
> No route to host',))
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:35.780 11 WARNING
> octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-10febb10-85ea-4082-9df7-daa48894b004 - a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries exceeded
> with url:
> /0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b-b93f-1da3957a5b71/haproxy
> (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f55984e2050>: Failed to establish a new connection: [Errno 113]
> No route to host',))
>
>
>
> Thank you,
>
> Mihaela Balas
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list