[Openstack-operators] nova-api response time

Vladimir Prokofev v at prokofev.me
Wed May 10 11:32:22 UTC 2017


> George Mihaiescu
> The data populating the instances tab in Dashboard comes from at least
three API services (Nova, Glance and Neutron), so you have to look at all
of them if you want to find the cause of the slowness.
> There is a possible related bug report:
> https://bugs.launchpad.net/keystone/+bug/1587777
I vaguely remember hitting this like year ago, when we first deployed
Mitaka. AFAIR, it was some kind of issue with WSGI and privilege group.
Anyway, that was fixed, and response times I get are not >30 sec, but
rather 1-2 sec, 5 at most heavy operations like listing all instances.
Anyway, I'm well beyond Horizon in my diagnostics, as currenly I'm
concerned with API slowness in general. Horizon itself works fine.

> Matt Riedemann
> I can't answer your questions about Keystone and tokens but you asked a
question about ways (tools) to profile the time spent in a nova-api request
to list servers, and Nova has supported the osprofiler tool [1] since the
Newton release. That might help you out with profiling.
Thank you, I'll look into this.

> Lance Bragstad
> Are you able to share your memcache/keystone caching configs at all? That
might help give us an idea of where to start and how to recreate locally.
Sure thing. They are(were) empty :) Whole [cache] section in keystone.conf
is empty, meaning all defaults are in action.

> This is totally dependent on hardware, but do you have any subsequent
runs that would prove this behavior is a regression? If so, that'd be super
useful to have in a bug report.
I'm sorry, I don't understand what "a regression" means in current context.
I understand what a "regrettion testing" is, for example.
Also, you're assuming this is a bug? I don't, I just think that my
enviroment is slow in general as my controller nodes are VMs on somewhat
old and not-that-fast hardware.

> To turn on caching you'd have to flip the global caching option from
oslo.cache and then enable the systems of keystone you'd want to have cache
responses, most of them default to true though. The master caching option
provided by oslo.cache defaults to False.
Now we're talking. I looked it up, and configured some [cache] options:
backend = oslo_cache.memcache_pool
enabled = true
memcache_servers = controller1:11211

and restarted Apache. This instantly led to drop in token
creation/validation times. Validation dropped to 30-50ms, creation dropped
to 100-150ms.
Here's a validation log example:
before:
10.110.0.31 publicipwashere - - [10/May/2017:06:33:47 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 478ms
10.110.0.31 publicipwashere - - [10/May/2017:06:33:49 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 456ms
10.110.0.31 publicipwashere - - [10/May/2017:06:33:50 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 460ms
10.110.0.31 publicipwashere - - [10/May/2017:06:33:51 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 420ms
10.110.0.31 publicipwashere - - [10/May/2017:06:33:51 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 451ms
10.110.0.31 publicipwashere - - [10/May/2017:06:33:52 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 527ms

after:
10.110.0.31 publicipwashere - - [10/May/2017:14:23:41 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 25ms
10.110.0.31 publicipwashere - - [10/May/2017:14:23:41 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 39ms
10.110.0.31 publicipwashere - - [10/May/2017:14:23:41 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 20ms
10.110.0.31 publicipwashere - - [10/May/2017:14:23:42 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 40ms
10.110.0.31 publicipwashere - - [10/May/2017:14:23:42 +0300] "GET
/v3/auth/tokens HTTP/1.1" 200 5320 39ms

Ofcourse, first requests are still slow, but we're getting somewhere.


2017-05-08 7:22 GMT+03:00 Warad, Manjunath (Nokia - SG/Singapore) <
manjunath.warad at nokia.com>:

> Hi Vladimir,
>
>
>
> Thanks for providing the info. Sorry I couldn’t see anything about TCP
> connections so I thought to check…
>
>
>
> Have you taken into consideration the TCP setup connection into your
> profiling?
>
> TCP’s slow start process, window size learning etc?
>
> I feel this could have considerable impact unless you have run few
> thousand cycles to average out the time taken so that the TCP setup time is
> nullified.
>
>
>
> Regards,
>
> Manjunath
>
>
>
> *From:* Vladimir Prokofev [mailto:v at prokofev.me]
> *Sent:* Friday, 5 May, 2017 6:39 PM
> *To:* OpenStack Operators <openstack-operators at lists.openstack.org>
> *Subject:* [Openstack-operators] nova-api response time
>
>
>
> Hello Ops.
>
>
>
> So I had a feeling that my Openstack API is running a bit slow, like it
> takes a couple of seconds to get a full list of all instances in Horizon
> for example, which is kind of a lot for a basic MySQL select-and-present
> job. This got me wondering to how can I measure execution times and find
> out how long it takes for any part of the API request to execute.
>
>
>
> So I got a first REST client I could find(it happens to be a Google Chrome
> application called ARC(Advanced REST Client)), decided how and what I want
> to measure and got to it.
>
>
>
> I'm doing a basic /servers request to public nova-api endpoint using
> X-Auth-Token I got earlier from keystone. Project that I pull has only
> single instance in it.
>
> From clients perspective there're 4 major parts of request: connection
> time, send time, wait time, and receive time.
>
> Connection time, send time, and receive time are all negligible, as they
> take about 5-10ms combined. The major part is wait time, which can take
> from 130ms to 2500ms.
>
> Wait time is directly correlated to the execution time that I can see in
> nova-api log, with about 5-10ms added lag, which is expected and perfectly
> normal as I understand it.
>
>
>
> First run after some time is always the slowest one. and takes from 900ms
> to 2500ms. If I do the same request in the next few minutes it will be
> faster - 130 to 200ms. My guess is that there's some caching
> involved(memcached?).
>
>
>
> So my questions are:
>
>  - what execution times do you usually see in your environment for
> nova-api, or other APIs for that matter? I have a feeling that even 130ms
> is very slow for such a simple task;
>
>  - I assumed that first request is so slow because it pulls data from
> MySQL which resides on slow storage(3 node Galera cluster, each node is a
> KVM domain based on 2 SATA 7200 HDD in software RAID1), so I enabled slow
> query logs for MySQL with 500ms threshold, but I see nothing, even when
> wait time > 2000ms. Seems MySQL has nothing to do with it;
>
>  - one more idea is that it takes a lot of time for keystone to validate a
> token. With expired token I get 401 response in 500-600ms first time, and
> 30-40ms next time I do a request;
>
>  - another idea about slow storage is that nova-api has to write
> something(a log for example), before sending response, but I don't know how
> to test this exactly, rather that just transfer my controller nodes on an
> SSD;
>
>  - is there a way I can profile nova-api, and see exactly which method
> takes the most time?
>
>
>
> I understand that I didn't exactly described my setup. This is due to the
> fact that I'm more interested in learning how to profile API components,
> rather than just plainly fix my setup(e.g. make it faster). In the end as I
> see it it's either a storage related issue(fixed by transferring
> controllers and database to a fast storage like SSD), or computing power
> issue(fixed by placing controllers on a dedicated bare-metal with fast
> CPU/memory). I have a desire not to guess, but to know exactly which is it
> :)
>
> I will describe setup in detail if such request is made.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170510/a945a867/attachment.html>


More information about the OpenStack-operators mailing list