Nova causes MySQL timeouts
It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues?
Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://docs.openstack.org/keystone/stein/configuration/config-options.html
Document says:
[api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
[database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting?
My settings are default:
[api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None>
[database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5
It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings?
From: Gaëtan Trellu gaetan.trellu@incloudus.com Sent: Tuesday, September 3, 2019 1:37 PM To: Albert Braden albertb@synopsys.com Cc: openstack-discuss@lists.openstack.org Subject: Re: Nova causes MySQL timeouts
Hi Albert,
It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section.
Keep in mind than more workers you will have more connections will be opened on the database.
Gaetan (goldyfruit)
On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.commailto:Albert.Braden@synopsys.com> wrote:
It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues?
Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp.
Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
From: Albert Braden Albert.Braden@synopsys.com Sent: Wednesday, September 4, 2019 10:19 AM To: openstack-discuss@lists.openstack.org Cc: Gaëtan Trellu gaetan.trellu@incloudus.com Subject: RE: Nova causes MySQL timeouts
We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://docs.openstack.org/keystone/stein/configuration/config-options.htmlhttps://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=
Document says:
[api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
[database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting?
My settings are default:
[api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None>
[database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5
It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings?
From: Gaëtan Trellu <gaetan.trellu@incloudus.commailto:gaetan.trellu@incloudus.com> Sent: Tuesday, September 3, 2019 1:37 PM To: Albert Braden <albertb@synopsys.commailto:albertb@synopsys.com> Cc: openstack-discuss@lists.openstack.orgmailto:openstack-discuss@lists.openstack.org Subject: Re: Nova causes MySQL timeouts
Hi Albert,
It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section.
Keep in mind than more workers you will have more connections will be opened on the database.
Gaetan (goldyfruit)
On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.commailto:Albert.Braden@synopsys.com> wrote:
It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues?
Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp.
Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po... 1: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePoo... 2: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.connec...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts
We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://docs.openstack.org/keystone/stein/configuration/config-options.html https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=
Document says:
[api_database]
connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped.
max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy.
max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
[database]
connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped.
min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool.
max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy.
max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting?
My settings are default:
[api_database]:
#connection_recycle_time = 3600
#max_overflow = <None>
#max_pool_size = <None>
[database]:
#connection_recycle_time = 3600
#min_pool_size = 1
#max_overflow = 50
#max_pool_size = 5
It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings?
*From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts
Hi Albert,
It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section.
Keep in mind than more workers you will have more connections will be opened on the database.
Gaetan (goldyfruit)
On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote:
It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po... 1: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePoo... 2: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.connec...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://docs.openstack.org/keystone/stein/configuration/config-options.html https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e= Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po... 1: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePoo... 2: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.connec...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://docs.openstack.org/keystone/stein/configuration/config-options.html https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e= Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
On 9/9/2019 11:49 AM, Ben Nemec wrote:
Maybe we should add that to the help text for the option in oslo.db?
I was going to reply to Chris's email with something like this - sounds like the config option help could use some more details around how to calculate the value that's appropriate, what to look out for when it's miscalculated, things to try, etc. Lots of the DB tuning options suffer from the same kind of lack of info. I know I know patches welcome, I'm not helping by piling on, but I'm also not deep in this area.
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait | wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
-----Original Message----- From: Ben Nemec openstack@nemebean.com Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge chris@openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipe...
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo... 1: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_... 2: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys... https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e= Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?
e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec openstack@nemebean.com Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge chris@openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipe...
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
1: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_...
2: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys...
= https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config- 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7 II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e= Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
Coincidentally, I'm trying [1] via [2] based on advice from zzzeek.
efried
[1] https://dba.stackexchange.com/questions/19135/mysql-error-reading-communicat... [2] https://review.opendev.org/#/c/682661/
I was hopeful that this might be our issue, but we already have max_allowed_packet = 256M
root@us01odc-dev2-ctrl1:~# mysqladmin variables|grep allowed | max_allowed_packet | 268435456
-----Original Message----- From: Eric Fried openstack@fried.cc Sent: Tuesday, September 17, 2019 10:21 AM To: openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
Coincidentally, I'm trying [1] via [2] based on advice from zzzeek.
efried
[1] https://urldefense.proofpoint.com/v2/url?u=https-3A__dba.stackexchange.com_q... [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_...
I don't want to boil the ocean but i had similar problem my nove was loosing mysql db connection and we should culprit was Load-balancer (BigIP F5) it has different tcp-timeout compare to whatever openstack provide. after adjusting timeout on F5 my issue got resolved.
On Tue, Sep 17, 2019 at 12:56 PM Sean Mooney smooney@redhat.com wrote:
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?
e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec openstack@nemebean.com Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge chris@openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipe...
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
1: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_...
2: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys...
= https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config- 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7 II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e= Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
I had not heard about the eventlet heartbeat issue. Where can I read more about it?
The [wsgi] section of my nova.conf is default; nothing is uncommented.
-----Original Message----- From: Sean Mooney smooney@redhat.com Sent: Tuesday, September 17, 2019 9:50 AM To: Albert Braden albertb@synopsys.com; openstack-discuss@lists.openstack.org Cc: Ben Nemec openstack@nemebean.com; Chris Hoge chris@openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?
e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec openstack@nemebean.com Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge chris@openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipe...
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
1: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_...
2: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys...
= https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config- 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7 II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e= Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com mailto:gaetan.trellu@incloudus.com> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com mailto:albertb@synopsys.com> *Cc:* openstack-discuss@lists.openstack.org mailto:openstack-discuss@lists.openstack.org *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com mailto:Albert.Braden@synopsys.com> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
Le mar. 17 sept. 2019 à 19:55, Albert Braden Albert.Braden@synopsys.com a écrit :
I had not heard about the eventlet heartbeat issue. Where can I read more about it?
Under apache and mod_wsgi eventlet green thread doesn't work properly. Nova faced this issue few months ago through the use of oslo.messaging and especially through the heartbeat's rabbitmq driver.
The heartbeat was runned by using a green thread under apache and mod_wsgi, so after few secondes/minutes the heartbeat thread became idle and so the connection with the rabbitmq server was closed and re-opened etc... Hence, that introduced a lot of connections opened and closed between the client and the server.
You can find more discuss about there: - http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005822.html
And the oslo.messaging fix related to this issue : - https://github.com/openstack/oslo.messaging/commit/22f240b82fffbd62be8568a7d...
The [wsgi] section of my nova.conf is default; nothing is uncommented.
-----Original Message----- From: Sean Mooney smooney@redhat.com Sent: Tuesday, September 17, 2019 9:50 AM To: Albert Braden albertb@synopsys.com; openstack-discuss@lists.openstack.org Cc: Ben Nemec openstack@nemebean.com; Chris Hoge chris@openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL
wait_timeout so that it is longer than the
nova (and glance, neutron, etc.) connection_recycle_time (3600). I
increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v
_wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a
single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56
8016 [Warning] Aborted connection 8016 to db:
'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got
timeout reading communication packets)
Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57
8019 [Warning] Aborted connection 8019 to db:
'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com'
(Got timeout reading communication packets)
Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57
8018 [Warning] Aborted connection 8018 to db:
'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com'
(Got timeout reading communication packets)
Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50
8022 [Warning] Aborted connection 8022 to db:
'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com'
(Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that
all default to 3600. So it appears that, even
with wait_timeout > connection_recycle_time we still see mysql timeout
errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and
restarting MySQL. I expected that this would pause
the "Aborted connection" errors for 24 hours, but they started again
after an hour. So it looks like my original
assumption was incorrect. I thought nova was keeping connections open
until the MySQL server timed them out, but now
it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?
e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec openstack@nemebean.com Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge chris@openstack.org;
openstack-discuss@lists.openstack.org
Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a
good
first-run experience, IMO.
We don't default to 0. We default to 5:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipe...
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec openstack@nemebean.com
wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum
limit on the number of connections that can stay
open, and max_overflow is a maximum limit on the number of
connections that can be temporarily opened when the
pool has been consumed. It looks like the defaults are 5 and 10
which would keep 5 connections open all the time
and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number
of connections that I want to allow? Is that
a reasonable and correct configuration? Intuitively that doesn't
seem right, to have a pool size of 0, but if
the "pool" is a group of connections that will remain open until
they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means
unlimited. You could probably set it to 1 to
minimize the number of connections kept open, but then I expect
you'll have overhead from having to re-open
connections frequently.
It sounds like you could use a NullPool to eliminate connection
pooling entirely, but I don't think we support
that in oslo.db. Based on the error message you're seeing, I would
take a look at connection_recycle_time[2]. I
seem to recall seeing a comment that the recycle time needs to be
shorter than any of the timeouts in the path
between the service and the db (so anything like haproxy or mysql
itself). Shortening that, or lengthening
intervening timeouts, might get rid of these disconnection messages.
0:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
1:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_...
2:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo...
*From:* Albert Braden Albert.Braden@synopsys.com *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu gaetan.trellu@incloudus.com *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently.
I googled around and found this document:
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys...
= <
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys...
2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7
II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=>
Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout
before idle SQL connections are reaped.
max_overflow = None (Integer) If
set, use this value for max_overflow with
SQLAlchemy. max_pool_size = None (Integer)
Maximum number of SQL connections to keep open
in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout
before idle SQL connections are reaped.
min_pool_size = 1
(Integer) Minimum number of SQL connections to keep
open in a pool. max_overflow = 50
(Integer) If set, use this value for max_overflow
with SQLAlchemy. max_pool_size = None (Integer)
Maximum number of SQL connections to keep open
in a pool. If min_pool_size is >0, would that cause at least 1 connection to
remain open until it times out? What are the
recommended values for these, to allow unused connections to close
before they time out? Is “min_pool_size = 0”
an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a
document that explains more about these settings?
*From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:
gaetan.trellu@incloudus.com>>
*Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:
albertb@synopsys.com>>
*Cc:* openstack-discuss@lists.openstack.org <mailto:
openstack-discuss@lists.openstack.org>
*Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and
max_overflow options under [database] section.
Keep in mind than more workers you will have more connections will
be opened on the database.
Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com
mailto:Albert.Braden@synopsys.com> wrote:
It looks like nova is keeping mysql connections open until
they time
out. How are others responding to this issue? Do you just
ignore the
mysql errors, or is it possible to change configuration so
that nova
closes and reopens connections before they time out? Or is
there a
way to stop mysql from logging these aborted connections
without
hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host:
'asdf'
(Got timeout reading communication packets)
participants (9)
-
Albert Braden
-
Ben Nemec
-
Chris Hoge
-
Eric Fried
-
Gaëtan Trellu
-
Herve Beraud
-
Matt Riedemann
-
Satish Patel
-
Sean Mooney