VS: [octavia] Timeouts during building of lb? But then successful

florian at datalounges.com florian at datalounges.com
Wed Nov 11 14:45:51 UTC 2020


Hi just as an update, because i think its rude to ask for help and not provide the solution, however stupid it may be.

WE managed to get this working. So because we have a lot of worker threads through the openstack services, the connections ran out on Mysql 8 and obviously we increased the max_connections.

This however ended up just closing the connections with no explanation , which was the original problem. It turns out that we used in the past open_files_limit : -1 , whichi n mysql 8 signifies 10000, however as described in some redhatt Bugzilla article, this seems to be not enough. As soon as we increased it to 65000 (our linux limit is much higher than that), everything went perfectly... 

Octavia now deploys within 1 minute. And even through hosted Kubernetes we deploy a LB via Octavia in under 3 minutes.

Thank you Michael again for pointing me into the right direction
//F

-----Alkuperäinen viesti-----
Lähettäjä: Michael Johnson <johnsomor at gmail.com> 
Lähetetty: Monday, 9 November 2020 18.41
Vastaanottaja: Florian Rommel <florian at datalounges.com>
Kopio: openstack-discuss <openstack-discuss at lists.openstack.org>
Aihe: Re: [octavia] Timeouts during building of lb? But then successful

Hi Florian,

That is very unusual. It typically takes less than 30 seconds for a load balancer to be provisioned. It definitely sounds like the mysql instance is having trouble. This can also cause longer term issues if the query response time drops to 10 seconds or more(0.001 is normal), which could trigger unnecessary failovers.

In Octavia there are layers of "retries" to attempt to handle clouds that are having trouble. It sounds like database issues are triggering one or more of these retries.
There are a few retries that will be in play for database transactions:
MySQL internal retries/timeouts such as lock timeouts (logged on the mysql side) oslo.db includes some automatic retries (typically not logged without configuration file settings) Octavia tenacity and flow retries (Typically logged if the configuration file has Debug = True enabled)

This may also be a general network connection issue. The default REST timeouts (used when we connect to the amphora agents) is 600, I'm wondering if the lb-mgmt-network is also having an issue.

Please check your health manager log files. If there are database query time issues logged, it would point specifically to a mysql issue. In the past we have seen mysql clustering setups that were bad and caused performance issues (flipping primary instance, lock contention between the instances, etc.). You should not be seeing any log messages that the mysql database went away, that is not normal.

Michael

On Sun, Nov 8, 2020 at 7:06 AM Florian Rommel <florian at datalounges.com> wrote:
>
> Hi, so we have a fully functioning setup of octavia on ussuri and it works nicely, when it competes.
> So here is what happens:
> From octavia api to octavia worker takes 20 seconds for the job to be initiated.
> The loadbalancer gets built quickly and then we get a mysql went away error, the listener gets built and then a member , that works too, then the mysql error comes up with query took too long to execute.
> Now this is where it gets weird. This is all within the first 2 - 3 minutes.
> At this point it hangs and takes 10 minutes (600 seconds) for the next step to complete and then another 10 minutes and another 10 until it’s completed.
> It seems there is a timeout somewhere but even with debug on we do not see what is going on. Does anyone have a mysql 8 running and octavia executing fine? And could send me their redacted octavia or mysql conf files? We didn’t touch them but it seems that there is something off..
> especially since it then completes and works extremely nicely.
> I would highly appreciate it , even off list.
> Best regards,
> //f
>
>






More information about the openstack-discuss mailing list