Thanks for the explanation, Sean, I understand. I will try to capture some metrics from my test cloud (there's no real load though) on Epoxy (or Caracal as well) and see if I can compare them reliably to Antelope version. Because our users were quite delighted about the improved dashboard performance in Antelope... Have a great weekend! Thanks! Eugen Zitat von Sean Mooney <smooney@redhat.com>:
On 15/08/2025 12:58, Eugen Block wrote:
Hi,
I did not have nova cache enabled before, and I found a config mistake in the keystone cache section, which I corrected. Then I enabled nova cache and restarted apache and nova-api, there's no difference at all in the response times in the dashboard. It feels like caching doesn't to anything here (for us). Maybe I should start a new thread wrt horizon performance to not highjack this thread...
the caching in nova api is really more useful for the metadata api.
the only thing we really use caching form in the main api is keyston auth token validation.
we do not cache api responces in the main api but we do in the metadata api.
building the metadata for a vm can be slow and we use memcache to make sure that if subsequent request form a vm are receive by a different api worker process it can share the metadata object built by the first worker.
for the api we are mainly using cacheing so we dont have to keep validating the same token over and over again with keystone if you use it to make multiepl requests. that does help performance at the auth step but its not goign to speed up server list or flavor show.
Thanks, Eugen
Zitat von Konstantin Larin <klarin@sardinasystems.com>:
Hello,
We too observe the mentioned behavior, moreover CPU and RAM usage of Nova API slowly increases over time, API process eventually eats 100% CPU.
@Melanie thank you for the suggestion! I have disabled the cache, and will monitor how Nova performs.
On Wed, 2025-08-13 at 12:53 -0700, melanie witt wrote:
On 8/13/25 07:41, Chang Xue wrote:
Thanks. The os-query-sets API requests to nova have been successful, no error reported though, just the duration of it is becoming longer and longer unless we restart nova-api. It seems to be different from the bug being fixed in the pose.. But will keep checking nova api logs to see if I can find something more.
I think you mean the os-quota-sets API, right? In Caracal we had intended to change the default [quota]driver for the UnifiedLimitsDriver but this did not happen due to some upgrade concerns. So the default should still be the DbQuotaDriver.
I skimmed through the commit differences in Nova between Bobcat and Caracal and did not find anything that looked suspect so far.
The challenge is the problem could be in a number of places:
Nova or oslo.cache or dogpile.cache.<backend> or keystonemiddleware
and maybe more that I haven't thought of.
I would think similar to what you mentioned earlier that this would seem most likely to be related to some sort of caching and in the past we have seen such issues from missing or incorrect configuration, as Eugen mentioned. It is known that if the cache is not configured right, you will see a progressive slowdown in the API over time.
This would be the first thing to check in your nova-api nova.conf, you should have the following configuration to enable keystone auth token caching:
[cache] memcache_servers = localhost:11211 backend = dogpile.cache.memcached enabled = True
and if you have multiple memcache_servers they should be comma separated, for example: host1:port1,host2:port2
If your configuration is correct, then it might be worth trying a different [cache]backend to try and isolate whether the problem is related to this cache or something else.
And then go from there.
-melwitt