[Openstack-operators] nova-api response time

Lance Bragstad lbragstad at gmail.com
Mon May 8 03:35:02 UTC 2017


Hi Vladimir,

I can try and help with some of the keystone troubleshooting, but we might
need some more information.

On Fri, May 5, 2017 at 9:00 AM, Vladimir Prokofev <v at prokofev.me> wrote:

> Writing things down always helps.
> So I dug a little further, and seems that keystone is the weakest link.
> It appears that it takes 500-600ms to generate a token, similar time to
> validate it. This means that first time application accesses the API
> request will run at least 1000ms, or even more. Second time will be
> significantly faster because you no longer need to request a token, and
> validation is done via memcached.
>

Yep - this sounds accurate (even though a second to get a token seems like
a long time). It's also important to note that keystone has a global toggle
for caching (provided by oslo.cache [0]), in addition to sub-system level
caching implementations [1] [2]. To turn on caching you'd have to flip the
global caching option from oslo.cache and then enable the systems of
keystone you'd want to have cache responses, most of them default to true
though. The master caching option provided by oslo.cache defaults to False.


> But for some reason cached token expires very fast, in a matter of
> minutes(I didn't actually measured exact time), so at some point validation
> will once again take half a second.
> If your app written in a bad way, i.e. for every request it creates a new
> auth session, this means that it will run very very slowly, as every
> request will take half a second to get a token.
>
> So now it's a question of determining why keystone is so slow? Btw, I use
> keystone 10.0.0 with fernet tokens and apache WSGI(5 processes, 1 thread)
> on three controller nodes behind haproxy.
> I found this article[1] about keystone benchmarking, and it seems that it
> can run significantly faster.
>
> So once again:
>  - what are your average times for token creation/validation?
>

This is totally dependent on hardware, but do you have any subsequent runs
that would prove this behavior is a regression? If so, that'd be super
useful to have in a bug report.


>  - is this procedure computationally extensive and requires faster
> CPU(fernet implies encryption/decription)?
>

In my experience with Fernet, the encryption and decryption isn't as much
of a problem as rebuilding all the authorization/authentication context
(i.e. running without caching). But this is certainly something to look
into, too.


>  - appreciate a link on a good doc on how to check and increases memcached
> caching timers.
>

Are you able to share your memcache/keystone caching configs at all? That
might help give us an idea of where to start and how to recreate locally.


>
> By the way, I think this topic now should be named "openstack API response
> times in general", rather than strictly tie to nova-api.
>
> [1] https://docs.openstack.org/developer/rally/overview/
> stories/keystone/authenticate.html
>

[0]
https://github.com/openstack/oslo.cache/blob/master/oslo_cache/_opts.py#L56
[1]
https://github.com/openstack/keystone/blob/master/keystone/conf/identity.py#L80-L86
[2]
https://docs.openstack.org/developer/keystone/configuration.html#cache-configuration-section



>
> 2017-05-05 13:38 GMT+03:00 Vladimir Prokofev <v at prokofev.me>:
>
>> Hello Ops.
>>
>> So I had a feeling that my Openstack API is running a bit slow, like it
>> takes a couple of seconds to get a full list of all instances in Horizon
>> for example, which is kind of a lot for a basic MySQL select-and-present
>> job. This got me wondering to how can I measure execution times and find
>> out how long it takes for any part of the API request to execute.
>>
>> So I got a first REST client I could find(it happens to be a Google
>> Chrome application called ARC(Advanced REST Client)), decided how and what
>> I want to measure and got to it.
>>
>> I'm doing a basic /servers request to public nova-api endpoint using
>> X-Auth-Token I got earlier from keystone. Project that I pull has only
>> single instance in it.
>> From clients perspective there're 4 major parts of request: connection
>> time, send time, wait time, and receive time.
>> Connection time, send time, and receive time are all negligible, as they
>> take about 5-10ms combined. The major part is wait time, which can take
>> from 130ms to 2500ms.
>> Wait time is directly correlated to the execution time that I can see in
>> nova-api log, with about 5-10ms added lag, which is expected and perfectly
>> normal as I understand it.
>>
>> First run after some time is always the slowest one. and takes from 900ms
>> to 2500ms. If I do the same request in the next few minutes it will be
>> faster - 130 to 200ms. My guess is that there's some caching
>> involved(memcached?).
>>
>> So my questions are:
>>  - what execution times do you usually see in your environment for
>> nova-api, or other APIs for that matter? I have a feeling that even 130ms
>> is very slow for such a simple task;
>>  - I assumed that first request is so slow because it pulls data from
>> MySQL which resides on slow storage(3 node Galera cluster, each node is a
>> KVM domain based on 2 SATA 7200 HDD in software RAID1), so I enabled slow
>> query logs for MySQL with 500ms threshold, but I see nothing, even when
>> wait time > 2000ms. Seems MySQL has nothing to do with it;
>>  - one more idea is that it takes a lot of time for keystone to validate
>> a token. With expired token I get 401 response in 500-600ms first time, and
>> 30-40ms next time I do a request;
>>  - another idea about slow storage is that nova-api has to write
>> something(a log for example), before sending response, but I don't know how
>> to test this exactly, rather that just transfer my controller nodes on an
>> SSD;
>>  - is there a way I can profile nova-api, and see exactly which method
>> takes the most time?
>>
>> I understand that I didn't exactly described my setup. This is due to the
>> fact that I'm more interested in learning how to profile API components,
>> rather than just plainly fix my setup(e.g. make it faster). In the end as I
>> see it it's either a storage related issue(fixed by transferring
>> controllers and database to a fast storage like SSD), or computing power
>> issue(fixed by placing controllers on a dedicated bare-metal with fast
>> CPU/memory). I have a desire not to guess, but to know exactly which is it
>> :)
>> I will describe setup in detail if such request is made.
>>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170507/2a1ba122/attachment.html>


More information about the OpenStack-operators mailing list