[kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
Hi, could you please share the details of (or any other command that takes longer than usual): openstack user show --timing Thanks, Eugen Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2): Rebuilt region: $ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+ +------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+ real 0m2.733s user 0m0.982s sys 0m0.105s Not rebuilt: $ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+ +------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+ real 0m1.176s user 0m0.892s sys 0m0.118s -----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi, could you please share the details of (or any other command that takes longer than usual): openstack user show --timing Thanks, Eugen Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
Hm, we’ve seen something similar in the past when our memcached config was not correct, although I’m not sure if keystone was the culprit or if it was a different service. Anyway, I would probably start to compare memcached configs. Zitat von "Braden, Albert" <abraden@verisign.com>:
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2):
Rebuilt region:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+
real 0m2.733s user 0m0.982s sys 0m0.105s
Not rebuilt:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+
real 0m1.176s user 0m0.892s sys 0m0.118s
-----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi, could you please share the details of (or any other command that takes longer than usual):
openstack user show --timing
Thanks, Eugen
Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
That's plausible; we did have some trouble with memcached/mcrouter after the rebuild and we had to tweak our mcrouter config, but I'm having trouble getting good logs. I set designate_logging_debug: "True" and did a full deploy but I don't see any logs for memcached and mcrouter doesn't log anything interesting. On Tuesday, April 2, 2024 at 03:50:53 PM EDT, Eugen Block <eblock@nde.ag> wrote: Hm, we’ve seen something similar in the past when our memcached config was not correct, although I’m not sure if keystone was the culprit or if it was a different service. Anyway, I would probably start to compare memcached configs. Zitat von "Braden, Albert" <abraden@verisign.com>:
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2):
Rebuilt region:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+
real 0m2.733s user 0m0.982s sys 0m0.105s
Not rebuilt:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+
real 0m1.176s user 0m0.892s sys 0m0.118s
-----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi, could you please share the details of (or any other command that takes longer than usual):
openstack user show --timing
Thanks, Eugen
Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
I know absolutely nothing about mcrouter, unfortunately. I also remember that we didn't really have much to go on with memcached, it's been a couple of years since we last tweaked it. I'll need to ask a colleague if he remembers the details how we pinpointed it to memcached. I'll get back to you if I get more details. Zitat von Albert Braden <ozzzo@yahoo.com>:
That's plausible; we did have some trouble with memcached/mcrouter after the rebuild and we had to tweak our mcrouter config, but I'm having trouble getting good logs. I set designate_logging_debug: "True" and did a full deploy but I don't see any logs for memcached and mcrouter doesn't log anything interesting. On Tuesday, April 2, 2024 at 03:50:53 PM EDT, Eugen Block <eblock@nde.ag> wrote:
Hm, we’ve seen something similar in the past when our memcached config was not correct, although I’m not sure if keystone was the culprit or if it was a different service. Anyway, I would probably start to compare memcached configs.
Zitat von "Braden, Albert" <abraden@verisign.com>:
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2):
Rebuilt region:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+
real 0m2.733s user 0m0.982s sys 0m0.105s
Not rebuilt:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+
real 0m1.176s user 0m0.892s sys 0m0.118s
-----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi, could you please share the details of (or any other command that takes longer than usual):
openstack user show --timing
Thanks, Eugen
Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
Turns out we had configured our services to connect to mcrouter via haproxy, but we didn't have a stanza for mcrouter in our haproxy config, and that was breaking caching. On Thursday, April 4, 2024 at 03:45:27 AM EDT, Eugen Block <eblock@nde.ag> wrote: I know absolutely nothing about mcrouter, unfortunately. I also remember that we didn't really have much to go on with memcached, it's been a couple of years since we last tweaked it. I'll need to ask a colleague if he remembers the details how we pinpointed it to memcached. I'll get back to you if I get more details. Zitat von Albert Braden <ozzzo@yahoo.com>:
That's plausible; we did have some trouble with memcached/mcrouter after the rebuild and we had to tweak our mcrouter config, but I'm having trouble getting good logs. I set designate_logging_debug: "True" and did a full deploy but I don't see any logs for memcached and mcrouter doesn't log anything interesting. On Tuesday, April 2, 2024 at 03:50:53 PM EDT, Eugen Block <eblock@nde.ag> wrote:
Hm, we’ve seen something similar in the past when our memcached config was not correct, although I’m not sure if keystone was the culprit or if it was a different service. Anyway, I would probably start to compare memcached configs.
Zitat von "Braden, Albert" <abraden@verisign.com>:
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2):
Rebuilt region:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+
real 0m2.733s user 0m0.982s sys 0m0.105s
Not rebuilt:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+
real 0m1.176s user 0m0.892s sys 0m0.118s
-----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi, could you please share the details of (or any other command that takes longer than usual):
openstack user show --timing
Thanks, Eugen
Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
I see, thanks for the update. Zitat von Albert Braden <ozzzo@yahoo.com>:
Turns out we had configured our services to connect to mcrouter via haproxy, but we didn't have a stanza for mcrouter in our haproxy config, and that was breaking caching. On Thursday, April 4, 2024 at 03:45:27 AM EDT, Eugen Block <eblock@nde.ag> wrote:
I know absolutely nothing about mcrouter, unfortunately. I also remember that we didn't really have much to go on with memcached, it's been a couple of years since we last tweaked it. I'll need to ask a colleague if he remembers the details how we pinpointed it to memcached. I'll get back to you if I get more details.
Zitat von Albert Braden <ozzzo@yahoo.com>:
That's plausible; we did have some trouble with memcached/mcrouter after the rebuild and we had to tweak our mcrouter config, but I'm having trouble getting good logs. I set designate_logging_debug: "True" and did a full deploy but I don't see any logs for memcached and mcrouter doesn't log anything interesting. On Tuesday, April 2, 2024 at 03:50:53 PM EDT, Eugen Block <eblock@nde.ag> wrote:
Hm, we’ve seen something similar in the past when our memcached config was not correct, although I’m not sure if keystone was the culprit or if it was a different service. Anyway, I would probably start to compare memcached configs.
Zitat von "Braden, Albert" <abraden@verisign.com>:
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2):
Rebuilt region:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+
real 0m2.733s user 0m0.982s sys 0m0.105s
Not rebuilt:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+
real 0m1.176s user 0m0.892s sys 0m0.118s
-----Original Message----- From: Eugen Block <eblock@nde.ag> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi, could you please share the details of (or any other command that takes longer than usual):
openstack user show --timing
Thanks, Eugen
Zitat von Albert Braden <ozzzo@yahoo.com>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
Hello Albert, Out of curiosity - do you have some comparison between the standard memcached setup in Kolla-Ansible and using mcrouter? Best regards, Michal
On 5 Apr 2024, at 07:47, Albert Braden <ozzzo@yahoo.com> wrote:
Turns out we had configured our services to connect to mcrouter via haproxy, but we didn't have a stanza for mcrouter in our haproxy config, and that was breaking caching. On Thursday, April 4, 2024 at 03:45:27 AM EDT, Eugen Block <eblock@nde.ag> wrote:
I know absolutely nothing about mcrouter, unfortunately. I also remember that we didn't really have much to go on with memcached, it's been a couple of years since we last tweaked it. I'll need to ask a colleague if he remembers the details how we pinpointed it to memcached. I'll get back to you if I get more details.
Zitat von Albert Braden <ozzzo@yahoo.com <mailto:ozzzo@yahoo.com>>:
That's plausible; we did have some trouble with memcached/mcrouter after the rebuild and we had to tweak our mcrouter config, but I'm having trouble getting good logs. I set designate_logging_debug: "True" and did a full deploy but I don't see any logs for memcached and mcrouter doesn't log anything interesting. On Tuesday, April 2, 2024 at 03:50:53 PM EDT, Eugen Block <eblock@nde.ag <mailto:eblock@nde.ag>> wrote:
Hm, we’ve seen something similar in the past when our memcached config was not correct, although I’m not sure if keystone was the culprit or if it was a different service. Anyway, I would probably start to compare memcached configs.
Zitat von "Braden, Albert" <abraden@verisign.com <mailto:abraden@verisign.com>>:
The timing result is 1.6 seconds (vs 0.2 on a non-rebuilt region), but the elapsed time is 2.7 seconds (vs 1.2):
Rebuilt region:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+--------------------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+--------------------+ | GET https://api-ext.region.ourdomain.com:5000/v3 <https://api-ext.region.ourdomain.com:5000/v3%C2%A0> | 0.030608 | | POST https://api-ext.region.ourdomain.com:5000/v3/auth/tokens <https://api-ext.region.ourdomain.com:5000/v3/auth/tokens%C2%A0> | 0.550521 | | GET https://api-ext.region.ourdomain.com:5000/ <https://api-ext.region.ourdomain.com:5000/%C2%A0> | 0.006306 | | GET https://api-ext.region.ourdomain.com:5000/v3/auth/tokens <https://api-ext.region.ourdomain.com:5000/v3/auth/tokens%C2%A0> | 0.608377 | | GET https://api-ext.region.ourdomain.com:5000/v3/users/<ID> | 0.458892 | | Total | 1.6547040000000002 | +------------------------------------------------------------------------------------------------------------------+--------------------+
real 0m2.733s user 0m0.982s sys 0m0.105s
Not rebuilt:
$ time openstack user show --timing user +---------------------+------------------------------------------------------------------+ | Field | Value | +---------------------+------------------------------------------------------------------+ | domain_id | <DID> | | email | user@ourdomain.com | | enabled | True | | id | <ID> | | name | user | | options | {} | | password_expires_at | None | +---------------------+------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------------+----------+ | URL | Seconds | +------------------------------------------------------------------------------------------------------------------+----------+ | GET https://api-ext.oldregion.ourdomain.com:5000/v3 <https://api-ext.oldregion.ourdomain.com:5000/v3%C2%A0> | 0.026787 | | POST https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens <https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens%C2%A0> | 0.042511 | | GET https://api-ext.oldregion.ourdomain.com:5000/ <https://api-ext.oldregion.ourdomain.com:5000/%C2%A0> | 0.00767 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens <https://api-ext.oldregion.ourdomain.com:5000/v3/auth/tokens%C2%A0> | 0.049546 | | GET https://api-ext.oldregion.ourdomain.com:5000/v3/users/<ID> | 0.044296 | | Total | 0.17081 | +------------------------------------------------------------------------------------------------------------------+----------+
real 0m1.176s user 0m0.892s sys 0m0.118s
-----Original Message----- From: Eugen Block <eblock@nde.ag <mailto:eblock@nde.ag>> Sent: Tuesday, April 2, 2024 12:52 PM To: openstack-discuss@lists.openstack.org <mailto:openstack-discuss@lists.openstack.org> Subject: [EXTERNAL] Re: [kolla] [Wallaby] Openstack operations 2x slower after rebuilding from RHEL8/Train to RHEL9/Wallaby
Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi, could you please share the details of (or any other command that takes longer than usual):
openstack user show --timing
Thanks, Eugen
Zitat von Albert Braden <ozzzo@yahoo.com <mailto:ozzzo@yahoo.com>>:
After rebuilding a couple of our clusters, moving from Train on RHEL8 to Wallaby on RHEL9, we see all openstack operations taking 2x longer. Even simple database reads such as "openstack user show" average 1.5 seconds on old regions and 3 seconds on rebuilt regions. I enabled debug and poked around in the logs but nothing is jumping out at me. I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
On 02.04.24 6:08 PM, Albert Braden wrote:
I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay?
1) It seems you are "only" testing with keystone (users) or, better put, you already observe the issue at the keystone API? Are any of the other APIs slow as well? 2) Did you init a new database (fresh schema) or did a "db sync" to the existing DB and applied all the schema updates? 3) Did you check the upgrade logs? Are you certain you applied all the upgrade steps including online updates? See https://docs.openstack.org/keystone/latest/admin/upgrading.html 4) Can you maybe init a new DB and compare the table schemas to see if there is any index or similar missing on your upgraded database? 5) Turn on slow query logging, preferrably using the extended capabilities https://mariadb.com/kb/en/slow-query-log-extended-statistics/ to find any query that is slow or not using an index. Regards Christian
We wiped the OS and did a fresh install of RHEL9, and then installed Wallaby, and then restored users, projects, networks, etc. We see it on every operation; for example VM create and delete take substantially longer in elapsed time, and >10x longer in measured time (via --timing). On Wednesday, April 3, 2024 at 06:15:31 AM EDT, Christian Rohmann <christian.rohmann@inovex.de> wrote: On 02.04.24 6:08 PM, Albert Braden wrote: I was thinking maybe mariadb needed to be tuned differently from RHEL8->9 but if I log into the mariadb container and query the database it seems equally fast; I can do 100 simple queries in a second. Where should I be looking for the source of this delay? 1) It seems you are "only" testing with keystone (users) or, better put, you already observe the issue at the keystone API? Are any of the other APIs slow as well? 2) Did you init a new database (fresh schema) or did a "db sync" to the existing DB and applied all the schema updates? 3) Did you check the upgrade logs? Are you certain you applied all the upgrade steps including online updates? See https://docs.openstack.org/keystone/latest/admin/upgrading.html 4) Can you maybe init a new DB and compare the table schemas to see if there is any index or similar missing on your upgraded database? 5) Turn on slow query logging, preferrably using the extended capabilities https://mariadb.com/kb/en/slow-query-log-extended-statistics/ to find any query that is slow or not using an index. Regards Christian
participants (5)
-
Albert Braden
-
Braden, Albert
-
Christian Rohmann
-
Eugen Block
-
Michał Nasiadka