Nova causes MySQL timeouts
It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://docs.openstack.org/keystone/stein/configuration/config-options.html Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? From: Gaëtan Trellu <gaetan.trellu@incloudus.com> Sent: Tuesday, September 3, 2019 1:37 PM To: Albert Braden <albertb@synopsys.com> Cc: openstack-discuss@lists.openstack.org Subject: Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct? From: Albert Braden <Albert.Braden@synopsys.com> Sent: Wednesday, September 4, 2019 10:19 AM To: openstack-discuss@lists.openstack.org Cc: Gaëtan Trellu <gaetan.trellu@incloudus.com> Subject: RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://docs.openstack.org/keystone/stein/configuration/config-options.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? From: Gaëtan Trellu <gaetan.trellu@incloudus.com<mailto:gaetan.trellu@incloudus.com>> Sent: Tuesday, September 3, 2019 1:37 PM To: Albert Braden <albertb@synopsys.com<mailto:albertb@synopsys.com>> Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com<mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
Tagging with oslo as this sounds related to oslo.db. On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp.
Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently. It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages. 0: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po... 1: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePoo... 2: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.connec...
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts
We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
https://docs.openstack.org/keystone/stein/configuration/config-options.html <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=>
Document says:
[api_database]
connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped.
max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy.
max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
[database]
connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped.
min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool.
max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy.
max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool.
If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting?
My settings are default:
[api_database]:
#connection_recycle_time = 3600
#max_overflow = <None>
#max_pool_size = <None>
[database]:
#connection_recycle_time = 3600
#min_pool_size = 1
#max_overflow = 50
#max_pool_size = 5
It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings?
*From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts
Hi Albert,
It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section.
Keep in mind than more workers you will have more connections will be opened on the database.
Gaetan (goldyfruit)
On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote:
It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues?
Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast. The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO. This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important. http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com> wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po... 1: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePoo... 2: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.connec...
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://docs.openstack.org/keystone/stein/configuration/config-options.html <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po...
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
http://lists.openstack.org/pipermail/openstack-dev/2015-April/061808.html
Thanks, I did not recall that discussion. If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com> wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.max_po... 1: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.QueuePoo... 2: https://docs.openstack.org/oslo.db/stein/reference/opts.html#database.connec...
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://docs.openstack.org/keystone/stein/configuration/config-options.html <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
On 9/9/2019 11:49 AM, Ben Nemec wrote:
Maybe we should add that to the help text for the option in oslo.db?
I was going to reply to Chris's email with something like this - sounds like the config option help could use some more details around how to calculate the value that's appropriate, what to look out for when it's miscalculated, things to try, etc. Lots of the DB tuning options suffer from the same kind of lack of info. I know I know patches welcome, I'm not helping by piling on, but I'm also not deep in this area. -- Thanks, Matt
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000: root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait | wait_timeout | 6000 But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host. Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors. Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening. Has anyone successfully stopped these MySQL error messages? -----Original Message----- From: Ben Nemec <openstack@nemebean.com> Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge <chris@openstack.org>; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.max-5Fpool-5Fsize&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=p7bBYcuhnDR_J08MWFBj8XLiRUUV8JfruAIcl0zF234&e=
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
Thanks, I did not recall that discussion. If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com> wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.max-5Fpool-5Fsize&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=p7bBYcuhnDR_J08MWFBj8XLiRUUV8JfruAIcl0zF234&e= 1: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.sqlalchemy.org_en_13_core_pooling.html-23sqlalchemy.pool.QueuePool.-5F-5Finit-5F-5F&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=_EIhQyyj1gSM0PrX7de3yJr8hNi7tD8-tnfPo2VV_LU&e= 2: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_oslo.db_stein_reference_opts.html-23database.connection-5Frecycle-5Ftime&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=xDnj80EQrxXwenOLgmKEaJbF3VRIylapDgqyMs81pSY&e=
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document: https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=NXcUpNTYGd6ZP-1oOUaQXsF7rHQ0mAt4e9uL8zzd0KA&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keystone_stein_configuration_config-2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi? e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec <openstack@nemebean.com> Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge <chris@openstack.org>; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com> wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
= <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys... 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7 II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
Coincidentally, I'm trying [1] via [2] based on advice from zzzeek. efried [1] https://dba.stackexchange.com/questions/19135/mysql-error-reading-communicat... [2] https://review.opendev.org/#/c/682661/
I was hopeful that this might be our issue, but we already have max_allowed_packet = 256M root@us01odc-dev2-ctrl1:~# mysqladmin variables|grep allowed | max_allowed_packet | 268435456 -----Original Message----- From: Eric Fried <openstack@fried.cc> Sent: Tuesday, September 17, 2019 10:21 AM To: openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts Coincidentally, I'm trying [1] via [2] based on advice from zzzeek. efried [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__dba.stackexchange.com_questions_19135_mysql-2Derror-2Dreading-2Dcommunication-2Dpackets_19139-2319139&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=WhqrFWiq68cDG65DEDJIVr7qQiBSBt8bwY70zlZPx5Y&s=E_872jzry6thItjApAzjBLumvxG2WIAOqA2CnGEhiU4&e= [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_682661_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=WhqrFWiq68cDG65DEDJIVr7qQiBSBt8bwY70zlZPx5Y&s=wYVKBtvmm1jv4bd0xtP7fsALj2SgyPnYrWSTkAtpG9U&e=
I don't want to boil the ocean but i had similar problem my nove was loosing mysql db connection and we should culprit was Load-balancer (BigIP F5) it has different tcp-timeout compare to whatever openstack provide. after adjusting timeout on F5 my issue got resolved. On Tue, Sep 17, 2019 at 12:56 PM Sean Mooney <smooney@redhat.com> wrote:
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?
e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec <openstack@nemebean.com> Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge <chris@openstack.org>; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com> wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
= <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys... 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7 II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
I had not heard about the eventlet heartbeat issue. Where can I read more about it? The [wsgi] section of my nova.conf is default; nothing is uncommented. -----Original Message----- From: Sean Mooney <smooney@redhat.com> Sent: Tuesday, September 17, 2019 9:50 AM To: Albert Braden <albertb@synopsys.com>; openstack-discuss@lists.openstack.org Cc: Ben Nemec <openstack@nemebean.com>; Chris Hoge <chris@openstack.org> Subject: Re: [oslo][nova] Nova causes MySQL timeouts On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi? e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec <openstack@nemebean.com> Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge <chris@openstack.org>; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com> wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum limit on the number of connections that can stay open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until they time out, then maybe 0 is correct?
I don't think so. According to [0] and [1], a pool_size of 0 means unlimited. You could probably set it to 1 to minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection pooling entirely, but I don't think we support that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently. I googled around and found this document:
= <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys... 2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7 II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=> Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto:gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto:albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until they time out. How are others responding to this issue? Do you just ignore the mysql errors, or is it possible to change configuration so that nova closes and reopens connections before they time out? Or is there a way to stop mysql from logging these aborted connections without hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host: 'asdf' (Got timeout reading communication packets)
Le mar. 17 sept. 2019 à 19:55, Albert Braden <Albert.Braden@synopsys.com> a écrit :
I had not heard about the eventlet heartbeat issue. Where can I read more about it?
Under apache and mod_wsgi eventlet green thread doesn't work properly. Nova faced this issue few months ago through the use of oslo.messaging and especially through the heartbeat's rabbitmq driver. The heartbeat was runned by using a green thread under apache and mod_wsgi, so after few secondes/minutes the heartbeat thread became idle and so the connection with the rabbitmq server was closed and re-opened etc... Hence, that introduced a lot of connections opened and closed between the client and the server. You can find more discuss about there: - http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005822.html And the oslo.messaging fix related to this issue : - https://github.com/openstack/oslo.messaging/commit/22f240b82fffbd62be8568a7d...
The [wsgi] section of my nova.conf is default; nothing is uncommented.
-----Original Message----- From: Sean Mooney <smooney@redhat.com> Sent: Tuesday, September 17, 2019 9:50 AM To: Albert Braden <albertb@synopsys.com>; openstack-discuss@lists.openstack.org Cc: Ben Nemec <openstack@nemebean.com>; Chris Hoge <chris@openstack.org> Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On Tue, 2019-09-17 at 16:36 +0000, Albert Braden wrote:
I thought I had figured out that the solution was to increase the MySQL wait_timeout so that it is longer than the nova (and glance, neutron, etc.) connection_recycle_time (3600). I increased my MySQL wait_timeout to 6000:
root@us01odc-qa-ctrl1:~# mysqladmin variables|grep wait_timeout|grep -v _wait
wait_timeout | 6000
But I still see the MySQL errors. There's no LB; we are pointing to a single MySQL host.
Sep 11 14:59:56 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:56 8016 [Warning] Aborted connection 8016 to db: 'nova' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8019 [Warning] Aborted connection 8019 to db: 'glance' user: 'glance' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 14:59:57 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 14:59:57 8018 [Warning] Aborted connection 8018 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl2.internal.synopsys.com' (Got timeout reading communication packets) Sep 11 15:00:50 us01odc-qa-ctrl1 mysqld[1052956]: 2019-09-11 15:00:50 8022 [Warning] Aborted connection 8022 to db: 'nova_api' user: 'nova' host: 'us01odc-qa-ctrl1.internal.synopsys.com' (Got timeout reading communication packets)
The errors come from nova, neutron, glance and keystone; it appears that all default to 3600. So it appears that, even with wait_timeout > connection_recycle_time we still see mysql timeout errors.
Just for fun I tried setting the MySQL wait_timeout to 86400 and restarting MySQL. I expected that this would pause the "Aborted connection" errors for 24 hours, but they started again after an hour. So it looks like my original assumption was incorrect. I thought nova was keeping connections open until the MySQL server timed them out, but now it appears that something else is happening.
Has anyone successfully stopped these MySQL error messages?
could this be related to the eventlet heartbeat issue we see for rabbitmq when running the api under mod_wsgi/uwsgi?
e.g. hav eyou confirmed that you wsgi serer is configure to use 1 thread and multiple processes for concurancy multiple thread in one process might have issues.
-----Original Message----- From: Ben Nemec <openstack@nemebean.com> Sent: Monday, September 9, 2019 9:50 AM To: Chris Hoge <chris@openstack.org>; openstack-discuss@lists.openstack.org Subject: Re: [oslo][nova] Nova causes MySQL timeouts
On 9/9/19 11:38 AM, Chris Hoge wrote:
In my personal experience, running Nova on a four core machine without limiting the number of database connections will easily exhaust the available connections to MySQL/MariaDB. Keep in mind that the limit applies to every instance of a service, so if Nova starts 'm' services replicated for 'n' cores with 'd' possible connections you'll be up to ‘m x n x d' connections. It gets big fast.
The default setting of '0' (that is, unlimited) does not make for a good first-run experience, IMO.
We don't default to 0. We default to 5:
This issue comes up every few years or so, and the consensus previously is that 200-2000 connections is recommended based on your needs. Your database has to be configured to handle the load and looking at the configuration value across all your services and setting them consistently and appropriately is important.
Thanks, I did not recall that discussion.
If I'm reading it correctly, Jay is suggesting that for MySQL we should just disable connection pooling. As I noted earlier, I don't think we expose the ability to do that in oslo.db (patches welcome!), but setting max_pool_size to 1 would get you pretty close. Maybe we should add that to the help text for the option in oslo.db?
On Sep 6, 2019, at 7:34 AM, Ben Nemec <openstack@nemebean.com>
wrote:
Tagging with oslo as this sounds related to oslo.db.
On 9/5/19 7:37 PM, Albert Braden wrote:
After more googling it appears that max_pool_size is a maximum
open, and max_overflow is a maximum limit on the number of connections that can be temporarily opened when the pool has been consumed. It looks like the defaults are 5 and 10 which would keep 5 connections open all the time and allow 10 temp. Do I need to set max_pool_size to 0 and max_overflow to the number of connections that I want to allow? Is that a reasonable and correct configuration? Intuitively that doesn't seem right, to have a pool size of 0, but if the "pool" is a group of connections that will remain open until
I don't think so. According to [0] and [1], a pool_size of 0 means
unlimited. You could probably set it to 1 to
minimize the number of connections kept open, but then I expect you'll have overhead from having to re-open connections frequently.
It sounds like you could use a NullPool to eliminate connection
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_pipermail_openstack-2Ddev_2015-2DApril_061808.html&d=DwIDaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=W7apBhYbgfvGgB46HWLe-By9d_MYg6RB_eU3C2mARRY&s=FGLfZK5eHj7z_xL-5DJsPgHkOt_T131ugvicMvcMDbc&e= limit on the number of connections that can stay they time out, then maybe 0 is correct? pooling entirely, but I don't think we support
that in oslo.db. Based on the error message you're seeing, I would take a look at connection_recycle_time[2]. I seem to recall seeing a comment that the recycle time needs to be shorter than any of the timeouts in the path between the service and the db (so anything like haproxy or mysql itself). Shortening that, or lengthening intervening timeouts, might get rid of these disconnection messages.
0:
1:
2:
*From:* Albert Braden <Albert.Braden@synopsys.com> *Sent:* Wednesday, September 4, 2019 10:19 AM *To:* openstack-discuss@lists.openstack.org *Cc:* Gaëtan Trellu <gaetan.trellu@incloudus.com> *Subject:* RE: Nova causes MySQL timeouts We’re not setting max_pool_size nor max_overflow option presently.
I googled around and found this document:
= < https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_keys...
2Doptions.html&d=DwMGaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=3eF4Bv1HRQW6gl7
II12rTTSKj_A9_LDISS6hU0nP-R0&s=0EGWx9qW60G1cxoPFCIv_G1-iXX20jKcC5-AwlCWk8g&e=>
Document says: [api_database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. max_overflow = None (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. [database] connection_recycle_time = 3600 (Integer) Timeout before idle SQL connections are reaped. min_pool_size = 1 (Integer) Minimum number of SQL connections to keep open in a pool. max_overflow = 50 (Integer) If set, use this value for max_overflow with SQLAlchemy. max_pool_size = None (Integer) Maximum number of SQL connections to keep open in a pool. If min_pool_size is >0, would that cause at least 1 connection to remain open until it times out? What are the recommended values for these, to allow unused connections to close before they time out? Is “min_pool_size = 0” an acceptable setting? My settings are default: [api_database]: #connection_recycle_time = 3600 #max_overflow = <None> #max_pool_size = <None> [database]: #connection_recycle_time = 3600 #min_pool_size = 1 #max_overflow = 50 #max_pool_size = 5 It’s not obvious what max_overflow does. Where can I find a document that explains more about these settings? *From:* Gaëtan Trellu <gaetan.trellu@incloudus.com <mailto: gaetan.trellu@incloudus.com>> *Sent:* Tuesday, September 3, 2019 1:37 PM *To:* Albert Braden <albertb@synopsys.com <mailto: albertb@synopsys.com>> *Cc:* openstack-discuss@lists.openstack.org <mailto: openstack-discuss@lists.openstack.org> *Subject:* Re: Nova causes MySQL timeouts Hi Albert, It is a configuration issue, have a look to max_pool_size and max_overflow options under [database] section. Keep in mind than more workers you will have more connections will be opened on the database. Gaetan (goldyfruit) On Sep 3, 2019 4:31 PM, Albert Braden <Albert.Braden@synopsys.com <mailto:Albert.Braden@synopsys.com>> wrote: It looks like nova is keeping mysql connections open until
out. How are others responding to this issue? Do you just
ignore the
mysql errors, or is it possible to change configuration so
closes and reopens connections before they time out? Or is
way to stop mysql from logging these aborted connections
without
hiding real issues? Aborted connection 10726 to db: 'nova' user: 'nova' host:
'asdf'
(Got timeout reading communication packets)
-- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
participants (9)
-
Albert Braden
-
Ben Nemec
-
Chris Hoge
-
Eric Fried
-
Gaëtan Trellu
-
Herve Beraud
-
Matt Riedemann
-
Satish Patel
-
Sean Mooney