[openstack-dev] [octavia] Sometimes amphoras are not re-created if they are not reached for more than heartbeat_timeout

mihaela.balas at orange.com mihaela.balas at orange.com
Thu May 3 08:51:50 UTC 2018


Hi Michael,

I build a new amphora image with the latest patches and I reproduced two different bugs that I see in my environment. One of them is similar to the one initially described in this thread. I opened two stories as you advised:

https://storyboard.openstack.org/#!/story/2001960
https://storyboard.openstack.org/#!/story/2001955

Meanwhile, can you provide some recommendation of values for the following parameters (maybe in relation with number of workers, cores, computes etc)?

[health_manager]
failover_threads
status_update_threads

[haproxy_amphora]
build_rate_limit
build_active_retries

[controller_worker]
workers
amp_active_retries
amp_active_wait_sec

[task_flow]
max_workers

Thank you for your help,
Mihaela Balas

-----Original Message-----
From: Michael Johnson [mailto:johnsomor at gmail.com] 
Sent: Friday, April 27, 2018 8:24 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [octavia] Sometimes amphoras are not re-created if they are not reached for more than heartbeat_timeout

Hi Mihaela,

I am sorry to hear you are having trouble with the queens release of Octavia.  It is true that a lot of work has gone into the failover capability, specifically working around a python threading issue and making it more resistant to certain neutron failure situations (missing ports, etc.).

I know of one open bug against the failover flows, https://storyboard.openstack.org/#!/story/2001481, "failover breaks in Active/Standby mode if both amphroae are down".

Unfortunately the log snippet above does not give me enough information about the problem to help with this issue. From the snippet it looks like the failovers were initiated, but the controllers are unable to reach the amphora-agent on the replacement amphora. It will continue those retry attempts, but eventually will fail the amphora into ERROR if it doesn't succeed.

One thought I have is if you created you amphora image in the last two weeks, you may have built an amphora using the master branch of octavia, which had a bug that impacted active/standby images. This was introduced working around the new pip 10 issues.  That patch has been
fixed: https://review.openstack.org/#/c/564371/

If neither of these situations match your environment, please open a story (https://storyboard.openstack.org/#!/dashboard/stories) for us and include the health manager logs from the point you delete the amphora up until it starts these connection attempts.  We will dig through those logs to see what the issue might be.

Michael (johnsom)

On Wed, Apr 25, 2018 at 4:07 AM,  <mihaela.balas at orange.com> wrote:
> Hello,
>
>
>
> I am testing Octavia Queens and I see that the failover behavior is 
> very much different than the one in Ocata (this is the version we are 
> currently running in production).
>
> One example of such behavior is:
>
>
>
> I create 4 load balancers and after the creation is successful, I shut 
> off all the 8 amphoras. Sometimes, even the health-manager agent does 
> not reach the amphoras, they are not deleted and re-created. The logs 
> look like shown below even when the heartbeat timeout is long passed. 
> Sometimes the amphoras are deleted and re-created. Sometimes,  they 
> are partially re-created – part of them remain in shut off.
>
> Heartbeat_timeout is set to 60 seconds.
>
>
>
>
>
>
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:26.244 11 
> WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-339b54a7-ab0c-422a-832f-a444cd710497 - 
> a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries 
> exceeded with url:
> /0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octav
> iasrv2.orange.com.pem (Caused by 
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f559862c710>: Failed to establish a new connection: 
> [Errno 113] No route to host',))
>
> [octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:26.464 13 
> WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - 
> a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries 
> exceeded with url:
> /0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8
> -9d73-2397e281712c/haproxy (Caused by 
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f8a0de95e10>: Failed to establish a new connection: 
> [Errno 113] No route to host',))
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:27.772 11 
> WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-10febb10-85ea-4082-9df7-daa48894b004 - 
> a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries 
> exceeded with url:
> /0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b
> -b93f-1da3957a5b71/haproxy (Caused by 
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f5598491c90>: Failed to establish a new connection: 
> [Errno 113] No route to host',))
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:34.252 11 
> WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-339b54a7-ab0c-422a-832f-a444cd710497 - 
> a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.15', port=9443): Max retries 
> exceeded with url:
> /0.5/listeners/285ad342-5582-423e-b654-1f0b50d91fb2/certificates/octav
> iasrv2.orange.com.pem (Caused by 
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f5598520790>: Failed to establish a new connection: 
> [Errno 113] No route to host',))
>
> [octavia-health-manager-3662231220-3lssd] 2018-04-25 10:57:34.476 13 
> WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-a63b795a-4b4f-4b90-a201-a4c9f49ac68b - 
> a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.14', port=9443): Max retries 
> exceeded with url:
> /0.5/listeners/a45bdef3-e7da-4a18-9f1f-53d5651efe0f/1615c1ec-249e-4fa8
> -9d73-2397e281712c/haproxy (Caused by 
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f8a0de953d0>: Failed to establish a new connection: 
> [Errno 113] No route to host',))
>
> [octavia-health-manager-3662231220-nxnt3] 2018-04-25 10:57:35.780 11 
> WARNING octavia.amphorae.drivers.haproxy.rest_api_driver
> [req-10febb10-85ea-4082-9df7-daa48894b004 - 
> a5f15235c0714365b98a50a11ec956e7
> - - -] Could not connect to instance. Retrying.: ConnectionError:
> HTTPSConnectionPool(host='192.168.0.19', port=9443): Max retries 
> exceeded with url:
> /0.5/listeners/96ce5862-d944-46cb-8809-e1e328268a66/fc5b7940-3527-4e9b
> -b93f-1da3957a5b71/haproxy (Caused by 
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection
> object at 0x7f55984e2050>: Failed to establish a new connection: 
> [Errno 113] No route to host',))
>
>
>
> Thank you,
>
> Mihaela Balas
>
> ______________________________________________________________________
> ___________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc pas etre diffuses, 
> exploites ou copies sans autorisation. Si vous avez recu ce message 
> par erreur, veuillez le signaler a l'expediteur et le detruire ainsi 
> que les pieces jointes. Les messages electroniques etant susceptibles 
> d'alteration, Orange decline toute responsabilite si ce message a ete 
> altere, deforme ou falsifie. Merci.
>
> This message and its attachments may contain confidential or 
> privileged information that may be protected by law; they should not 
> be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and 
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have 
> been modified, changed or falsified.
> Thank you.
>
>
> ______________________________________________________________________
> ____ OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.



More information about the OpenStack-dev mailing list