memchached connections
Hi, Is there any guidance or experiences to estimate the number of memcached connections? Here is memcached connection on one of the 3 controllers. Connection number is the total established connections to all 3 memcached nodes. Node 1: 10 Keystone workers have 62 connections. 11 Nova API workers have 37 connections. 6 Neutron server works have 4304 connections. 1 memcached has 4973 connections. Node 2: 10 Keystone workers have 62 connections. 11 Nova API workers have 30 connections. 6 Neutron server works have 3703 connections. 1 memcached has 4973 connections. Node 3: 10 Keystone workers have 54 connections. 11 Nova API workers have 15 connections. 6 Neutron server works have 6541 connections. 1 memcached has 4973 connections. Before I increase the connection limit for memcached, I'd like to understand if all the above is expected? How Neutron server and memcached take so many connections? Any elaboration is appreciated. BTW, the problem leading me here is memcached connection timeout, which results all services depending on memcached stop working properly. Thanks! Tony
Hello, python-memcached badly handles connections during a flush on reconnect and so connections can grow up exponentially [1]. I don't know if it is the same issue that you faced but it could be a track to follow. On oslo.cache a fix has been submitted but it is not yet merged [2]. [1] https://bugs.launchpad.net/oslo.cache/+bug/1888394 [2] https://review.opendev.org/#/c/742193/ Le ven. 11 sept. 2020 à 23:29, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Hi,
Is there any guidance or experiences to estimate the number of memcached connections?
Here is memcached connection on one of the 3 controllers. Connection number is the total established connections to all 3 memcached nodes.
Node 1: 10 Keystone workers have 62 connections. 11 Nova API workers have 37 connections. 6 Neutron server works have 4304 connections. 1 memcached has 4973 connections.
Node 2: 10 Keystone workers have 62 connections. 11 Nova API workers have 30 connections. 6 Neutron server works have 3703 connections. 1 memcached has 4973 connections.
Node 3: 10 Keystone workers have 54 connections. 11 Nova API workers have 15 connections. 6 Neutron server works have 6541 connections. 1 memcached has 4973 connections.
Before I increase the connection limit for memcached, I'd like to understand if all the above is expected?
How Neutron server and memcached take so many connections?
Any elaboration is appreciated.
BTW, the problem leading me here is memcached connection timeout, which results all services depending on memcached stop working properly.
Thanks! Tony
-- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
Radosław pointed another bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 referring to the same fix https://review.opendev.org/#/c/742193/ Regarding to the fix, The comment says "This flag is off by default for backwards compatibility.". But I see this flag is on by default in current code. That's how it causes issues. This fix changes the default value from on to off. It does break backwards compatibility. To keep Keystone working as the old way, along with this fix, this flag has to be explicitly set to true in keystone.conf. For neutron-server and nova-api, it's good to leave this flag off by default. Am I correct? Thanks! Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com> Sent: Monday, September 14, 2020 8:27 AM To: Tony Liu <tonyliu0592@hotmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: memchached connections
Hello,
python-memcached badly handles connections during a flush on reconnect and so connections can grow up exponentially [1].
I don't know if it is the same issue that you faced but it could be a track to follow.
On oslo.cache a fix has been submitted but it is not yet merged [2].
[1] https://bugs.launchpad.net/oslo.cache/+bug/1888394 [2] https://review.opendev.org/#/c/742193/
Le ven. 11 sept. 2020 à 23:29, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > a écrit :
Hi,
Is there any guidance or experiences to estimate the number of memcached connections?
Here is memcached connection on one of the 3 controllers. Connection number is the total established connections to all 3 memcached nodes.
Node 1: 10 Keystone workers have 62 connections. 11 Nova API workers have 37 connections. 6 Neutron server works have 4304 connections. 1 memcached has 4973 connections.
Node 2: 10 Keystone workers have 62 connections. 11 Nova API workers have 30 connections. 6 Neutron server works have 3703 connections. 1 memcached has 4973 connections.
Node 3: 10 Keystone workers have 54 connections. 11 Nova API workers have 15 connections. 6 Neutron server works have 6541 connections. 1 memcached has 4973 connections.
Before I increase the connection limit for memcached, I'd like to understand if all the above is expected?
How Neutron server and memcached take so many connections?
Any elaboration is appreciated.
BTW, the problem leading me here is memcached connection timeout, which results all services depending on memcached stop working properly.
Thanks! Tony
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
Le lun. 14 sept. 2020 à 18:09, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Radosław pointed another bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 referring to the same fix https://review.opendev.org/#/c/742193/
Regarding to the fix, The comment says "This flag is off by default for backwards compatibility.". But I see this flag is on by default in current code. That's how it causes issues. This fix changes the default value from on to off. It does break backwards compatibility. To keep Keystone working as the old way, along with this fix, this flag has to be explicitly set to true in keystone.conf. For neutron-server and nova-api, it's good to leave this flag off by default. Am I correct?
Long short story as far as I correctly remember this topic. Currently flush on reconnect is not an option and it is always triggered (in the corresponding scenario). If we decide to introduce this new option `memcache_pool_flush_on_reconnect` we need to set this option to `True` as the default value to keep the backward compat. If this option is set to `true` then flush on reconnect will be triggered all the time in the corresponding scenario. Use `True` as default value was my first choice for these changes, and I think we need to give prior to backward compat for the first time and in a second time start by deprecating this behavior and turn this option to `False` as the default value if it helps to fix things. Finally after some discussions `False` have been retained as default value (c.f comments on https://review.opendev.org/#/c/742193/) which mean that flush on reconnect will not be executed and in this case I think we can say that backward compat is broken as this is not the current behavior. AFAIK `flush_on_reconnect` have been added for Keystone and I think only Keystone really needs that but other people could confirm that. If we decide to continue with `False` as the default value then neutron-server and nova-api could leave this default value as I don't think we need that (c.f my previous line). Finally, it could be worth to deep dive in the python-memcached side which is where the root cause is (the exponential connections) and to see how to address that. Hope that helps you.
Thanks! Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com> Sent: Monday, September 14, 2020 8:27 AM To: Tony Liu <tonyliu0592@hotmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: memchached connections
Hello,
python-memcached badly handles connections during a flush on reconnect and so connections can grow up exponentially [1].
I don't know if it is the same issue that you faced but it could be a track to follow.
On oslo.cache a fix has been submitted but it is not yet merged [2].
[1] https://bugs.launchpad.net/oslo.cache/+bug/1888394 [2] https://review.opendev.org/#/c/742193/
Le ven. 11 sept. 2020 à 23:29, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > a écrit :
Hi,
Is there any guidance or experiences to estimate the number of memcached connections?
Here is memcached connection on one of the 3 controllers. Connection number is the total established connections to all 3 memcached nodes.
Node 1: 10 Keystone workers have 62 connections. 11 Nova API workers have 37 connections. 6 Neutron server works have 4304 connections. 1 memcached has 4973 connections.
Node 2: 10 Keystone workers have 62 connections. 11 Nova API workers have 30 connections. 6 Neutron server works have 3703 connections. 1 memcached has 4973 connections.
Node 3: 10 Keystone workers have 54 connections. 11 Nova API workers have 15 connections. 6 Neutron server works have 6541 connections. 1 memcached has 4973 connections.
Before I increase the connection limit for memcached, I'd like to understand if all the above is expected?
How Neutron server and memcached take so many connections?
Any elaboration is appreciated.
BTW, the problem leading me here is memcached connection timeout, which results all services depending on memcached stop working properly.
Thanks! Tony
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
-- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
Thanks for clarifications! I am fine with the fix. My point is that, to keep Keystone work as the way it used to be, with this fix, flush_on_reconnect has to be explicitly set to true in keystone.conf. This needs to be taken care of by TripleO, Kolla Ansible, Juju, etc. Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com> Sent: Monday, September 14, 2020 9:46 AM To: Tony Liu <tonyliu0592@hotmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: memchached connections
Le lun. 14 sept. 2020 à 18:09, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > a écrit :
Radosław pointed another bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 referring to the same fix https://review.opendev.org/#/c/742193/
Regarding to the fix, The comment says "This flag is off by default for backwards compatibility.". But I see this flag is on by default in current code. That's how it causes issues. This fix changes the default value from on to off. It does break backwards compatibility. To keep Keystone working as the old way, along with this fix, this flag has to be explicitly set to true in keystone.conf. For neutron-server and nova-api, it's good to leave this flag off by default. Am I correct?
Long short story as far as I correctly remember this topic.
Currently flush on reconnect is not an option and it is always triggered (in the corresponding scenario).
If we decide to introduce this new option `memcache_pool_flush_on_reconnect` we need to set this option to `True` as the default value to keep the backward compat.
If this option is set to `true` then flush on reconnect will be triggered all the time in the corresponding scenario.
Use `True` as default value was my first choice for these changes, and I think we need to give prior to backward compat for the first time and in a second time start by deprecating this behavior and turn this option to `False` as the default value if it helps to fix things.
Finally after some discussions `False` have been retained as default value (c.f comments on https://review.opendev.org/#/c/742193/) which mean that flush on reconnect will not be executed and in this case I think we can say that backward compat is broken as this is not the current behavior.
AFAIK `flush_on_reconnect` have been added for Keystone and I think only Keystone really needs that but other people could confirm that.
If we decide to continue with `False` as the default value then neutron- server and nova-api could leave this default value as I don't think we need that (c.f my previous line).
Finally, it could be worth to deep dive in the python-memcached side which is where the root cause is (the exponential connections) and to see how to address that.
Hope that helps you.
Thanks! Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com <mailto:hberaud@redhat.com> > Sent: Monday, September 14, 2020 8:27 AM To: Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > Cc: openstack-discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > Subject: Re: memchached connections
Hello,
python-memcached badly handles connections during a flush on reconnect and so connections can grow up exponentially [1].
I don't know if it is the same issue that you faced but it could be a track to follow.
On oslo.cache a fix has been submitted but it is not yet merged [2].
[1] https://bugs.launchpad.net/oslo.cache/+bug/1888394 [2] https://review.opendev.org/#/c/742193/
Le ven. 11 sept. 2020 à 23:29, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> <mailto:tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > > a écrit :
Hi,
Is there any guidance or experiences to estimate the number of memcached connections?
Here is memcached connection on one of the 3 controllers. Connection number is the total established connections to all 3 memcached nodes.
Node 1: 10 Keystone workers have 62 connections. 11 Nova API workers have 37 connections. 6 Neutron server works have 4304 connections. 1 memcached has 4973 connections.
Node 2: 10 Keystone workers have 62 connections. 11 Nova API workers have 30 connections. 6 Neutron server works have 3703 connections. 1 memcached has 4973 connections.
Node 3: 10 Keystone workers have 54 connections. 11 Nova API workers have 15 connections. 6 Neutron server works have 6541 connections. 1 memcached has 4973 connections.
Before I increase the connection limit for memcached, I'd like to understand if all the above is expected?
How Neutron server and memcached take so many connections?
Any elaboration is appreciated.
BTW, the problem leading me here is memcached connection timeout, which results all services depending on memcached stop working properly.
Thanks! Tony
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
Feel free to leave comments on the review. Le lun. 14 sept. 2020 à 19:17, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Thanks for clarifications! I am fine with the fix. My point is that, to keep Keystone work as the way it used to be, with this fix, flush_on_reconnect has to be explicitly set to true in keystone.conf. This needs to be taken care of by TripleO, Kolla Ansible, Juju, etc.
Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com> Sent: Monday, September 14, 2020 9:46 AM To: Tony Liu <tonyliu0592@hotmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: memchached connections
Le lun. 14 sept. 2020 à 18:09, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > a écrit :
Radosław pointed another bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 referring to the same fix https://review.opendev.org/#/c/742193/
Regarding to the fix, The comment says "This flag is off by default for backwards compatibility.". But I see this flag is on by default in current code. That's how it causes issues. This fix changes the default value from on to off. It does break backwards compatibility. To keep Keystone working as the old way, along with this fix, this flag has to be explicitly set to true in keystone.conf. For neutron-server and nova-api, it's good to leave this flag off by default. Am I correct?
Long short story as far as I correctly remember this topic.
Currently flush on reconnect is not an option and it is always triggered (in the corresponding scenario).
If we decide to introduce this new option `memcache_pool_flush_on_reconnect` we need to set this option to `True` as the default value to keep the backward compat.
If this option is set to `true` then flush on reconnect will be triggered all the time in the corresponding scenario.
Use `True` as default value was my first choice for these changes, and I think we need to give prior to backward compat for the first time and in a second time start by deprecating this behavior and turn this option to `False` as the default value if it helps to fix things.
Finally after some discussions `False` have been retained as default value (c.f comments on https://review.opendev.org/#/c/742193/) which mean that flush on reconnect will not be executed and in this case I think we can say that backward compat is broken as this is not the current behavior.
AFAIK `flush_on_reconnect` have been added for Keystone and I think only Keystone really needs that but other people could confirm that.
If we decide to continue with `False` as the default value then neutron- server and nova-api could leave this default value as I don't think we need that (c.f my previous line).
Finally, it could be worth to deep dive in the python-memcached side which is where the root cause is (the exponential connections) and to see how to address that.
Hope that helps you.
Thanks! Tony > -----Original Message----- > From: Herve Beraud <hberaud@redhat.com <mailto:hberaud@redhat.com> > > Sent: Monday, September 14, 2020 8:27 AM > To: Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > > Cc: openstack-discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > > Subject: Re: memchached connections > > Hello, > > python-memcached badly handles connections during a flush on reconnect > and so connections can grow up exponentially [1]. > > > I don't know if it is the same issue that you faced but it could be a > track to follow. > > On oslo.cache a fix has been submitted but it is not yet merged [2]. > > > [1] https://bugs.launchpad.net/oslo.cache/+bug/1888394 > [2] https://review.opendev.org/#/c/742193/ > > Le ven. 11 sept. 2020 à 23:29, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > <mailto:tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > > a écrit : > > > Hi, > > Is there any guidance or experiences to estimate the number > of memcached connections? > > Here is memcached connection on one of the 3 controllers. > Connection number is the total established connections to > all 3 memcached nodes. > > Node 1: > 10 Keystone workers have 62 connections. > 11 Nova API workers have 37 connections. > 6 Neutron server works have 4304 connections. > 1 memcached has 4973 connections. > > Node 2: > 10 Keystone workers have 62 connections. > 11 Nova API workers have 30 connections. > 6 Neutron server works have 3703 connections. > 1 memcached has 4973 connections. > > Node 3: > 10 Keystone workers have 54 connections. > 11 Nova API workers have 15 connections. > 6 Neutron server works have 6541 connections. > 1 memcached has 4973 connections. > > Before I increase the connection limit for memcached, I'd > like to understand if all the above is expected? > > How Neutron server and memcached take so many connections? > > Any elaboration is appreciated. > > BTW, the problem leading me here is memcached connection timeout, > which results all services depending on memcached stop working > properly. > > > Thanks! > Tony > > > > > > > -- > > Hervé Beraud > Senior Software Engineer > > Red Hat - Openstack Oslo > irc: hberaud > -----BEGIN PGP SIGNATURE----- > > wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ > Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ > RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP > F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G > 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g > glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw > m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ > hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 > qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y > F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 > B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O > v6rDpkeNksZ9fFSyoY2o > =ECSj > -----END PGP SIGNATURE----- >
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
-- Hervé Beraud Senior Software Engineer Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE----- wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
On 9/14/20 12:17 PM, Tony Liu wrote:
Thanks for clarifications! I am fine with the fix. My point is that, to keep Keystone work as the way it used to be, with this fix, flush_on_reconnect has to be explicitly set to true in keystone.conf. This needs to be taken care of by TripleO, Kolla Ansible, Juju, etc.
This issue is why I've -1'd the patch. We need to be able to enable the behavior by default for Keystone, even if we don't for other projects. On the review I linked to an example of how we could do that.
Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com> Sent: Monday, September 14, 2020 9:46 AM To: Tony Liu <tonyliu0592@hotmail.com> Cc: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: Re: memchached connections
Le lun. 14 sept. 2020 à 18:09, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > a écrit :
Radosław pointed another bug https://bugs.launchpad.net/keystonemiddleware/+bug/1883659 referring to the same fix https://review.opendev.org/#/c/742193/
Regarding to the fix, The comment says "This flag is off by default for backwards compatibility.". But I see this flag is on by default in current code. That's how it causes issues. This fix changes the default value from on to off. It does break backwards compatibility. To keep Keystone working as the old way, along with this fix, this flag has to be explicitly set to true in keystone.conf. For neutron-server and nova-api, it's good to leave this flag off by default. Am I correct?
Long short story as far as I correctly remember this topic.
Currently flush on reconnect is not an option and it is always triggered (in the corresponding scenario).
If we decide to introduce this new option `memcache_pool_flush_on_reconnect` we need to set this option to `True` as the default value to keep the backward compat.
If this option is set to `true` then flush on reconnect will be triggered all the time in the corresponding scenario.
Use `True` as default value was my first choice for these changes, and I think we need to give prior to backward compat for the first time and in a second time start by deprecating this behavior and turn this option to `False` as the default value if it helps to fix things.
Finally after some discussions `False` have been retained as default value (c.f comments on https://review.opendev.org/#/c/742193/) which mean that flush on reconnect will not be executed and in this case I think we can say that backward compat is broken as this is not the current behavior.
AFAIK `flush_on_reconnect` have been added for Keystone and I think only Keystone really needs that but other people could confirm that.
If we decide to continue with `False` as the default value then neutron- server and nova-api could leave this default value as I don't think we need that (c.f my previous line).
Finally, it could be worth to deep dive in the python-memcached side which is where the root cause is (the exponential connections) and to see how to address that.
Hope that helps you.
Thanks! Tony
-----Original Message----- From: Herve Beraud <hberaud@redhat.com <mailto:hberaud@redhat.com> > Sent: Monday, September 14, 2020 8:27 AM To: Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > Cc: openstack-discuss <openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> > Subject: Re: memchached connections
Hello,
python-memcached badly handles connections during a flush on reconnect and so connections can grow up exponentially [1].
I don't know if it is the same issue that you faced but it could be a track to follow.
On oslo.cache a fix has been submitted but it is not yet merged [2].
[1] https://bugs.launchpad.net/oslo.cache/+bug/1888394 [2] https://review.opendev.org/#/c/742193/
Le ven. 11 sept. 2020 à 23:29, Tony Liu <tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> <mailto:tonyliu0592@hotmail.com <mailto:tonyliu0592@hotmail.com> > > a écrit :
Hi,
Is there any guidance or experiences to estimate the number of memcached connections?
Here is memcached connection on one of the 3 controllers. Connection number is the total established connections to all 3 memcached nodes.
Node 1: 10 Keystone workers have 62 connections. 11 Nova API workers have 37 connections. 6 Neutron server works have 4304 connections. 1 memcached has 4973 connections.
Node 2: 10 Keystone workers have 62 connections. 11 Nova API workers have 30 connections. 6 Neutron server works have 3703 connections. 1 memcached has 4973 connections.
Node 3: 10 Keystone workers have 54 connections. 11 Nova API workers have 15 connections. 6 Neutron server works have 6541 connections. 1 memcached has 4973 connections.
Before I increase the connection limit for memcached, I'd like to understand if all the above is expected?
How Neutron server and memcached take so many connections?
Any elaboration is appreciated.
BTW, the problem leading me here is memcached connection timeout, which results all services depending on memcached stop working properly.
Thanks! Tony
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
--
Hervé Beraud Senior Software Engineer
Red Hat - Openstack Oslo irc: hberaud -----BEGIN PGP SIGNATURE-----
wsFcBAABCAAQBQJb4AwCCRAHwXRBNkGNegAALSkQAHrotwCiL3VMwDR0vcja10Q+ Kf31yCutl5bAlS7tOKpPQ9XN4oC0ZSThyNNFVrg8ail0SczHXsC4rOrsPblgGRN+ RQLoCm2eO1AkB0ubCYLaq0XqSaO+Uk81QxAPkyPCEGT6SRxXr2lhADK0T86kBnMP F8RvGolu3EFjlqCVgeOZaR51PqwUlEhZXZuuNKrWZXg/oRiY4811GmnvzmUhgK5G 5+f8mUg74hfjDbR2VhjTeaLKp0PhskjOIKY3vqHXofLuaqFDD+WrAy/NgDGvN22g glGfj472T3xyHnUzM8ILgAGSghfzZF5Skj2qEeci9cB6K3Hm3osj+PbvfsXE/7Kw m/xtm+FjnaywZEv54uCmVIzQsRIm1qJscu20Qw6Q0UiPpDFqD7O6tWSRKdX11UTZ hwVQTMh9AKQDBEh2W9nnFi9kzSSNu4OQ1dRMcYHWfd9BEkccezxHwUM4Xyov5Fe0 qnbfzTB1tYkjU78loMWFaLa00ftSxP/DtQ//iYVyfVNfcCwfDszXLOqlkvGmY1/Y F1ON0ONekDZkGJsDoS6QdiUSn8RZ2mHArGEWMV00EV5DCIbCXRvywXV43ckx8Z+3 B8qUJhBqJ8RS2F+vTs3DTaXqcktgJ4UkhYC2c1gImcPRyGrK9VY0sCT+1iA+wp/O v6rDpkeNksZ9fFSyoY2o =ECSj -----END PGP SIGNATURE-----
participants (3)
-
Ben Nemec
-
Herve Beraud
-
Tony Liu