<div dir="ltr">Hi Vladimir,<div><br></div><div>I can try and help with some of the keystone troubleshooting, but we might need some more information. </div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 5, 2017 at 9:00 AM, Vladimir Prokofev <span dir="ltr"><<a href="mailto:v@prokofev.me" target="_blank">v@prokofev.me</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Writing things down always helps.<div>So I dug a little further, and seems that keystone is the weakest link.</div><div>It appears that it takes 500-600ms to generate a token, similar time to validate it. This means that first time application accesses the API request will run at least 1000ms, or even more. Second time will be significantly faster because you no longer need to request a token, and validation is done via memcached.</div></div></blockquote><div><br></div><div>Yep - this sounds accurate (even though a second to get a token seems like a long time). It's also important to note that keystone has a global toggle for caching (provided by oslo.cache [0]), in addition to sub-system level caching implementations [1] [2]. To turn on caching you'd have to flip the global caching option from oslo.cache and then enable the systems of keystone you'd want to have cache responses, most of them default to true though. The master caching option provided by oslo.cache defaults to False. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>But for some reason cached token expires very fast, in a matter of minutes(I didn't actually measured exact time), so at some point validation will once again take half a second.</div><div>If your app written in a bad way, i.e. for every request it creates a new auth session, this means that it will run very very slowly, as every request will take half a second to get a token.</div><div><br></div><div>So now it's a question of determining why keystone is so slow? Btw, I use keystone 10.0.0 with fernet tokens and apache WSGI(5 processes, 1 thread) on three controller nodes behind haproxy.</div><div>I found this article[1] about keystone benchmarking, and it seems that it can run significantly faster.</div><div><br></div><div>So once again:</div><div> - what are your average times for token creation/validation?</div></div></blockquote><div><br></div><div>This is totally dependent on hardware, but do you have any subsequent runs that would prove this behavior is a regression? If so, that'd be super useful to have in a bug report.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> - is this procedure computationally extensive and requires faster CPU(fernet implies encryption/decription)?</div></div></blockquote><div><br></div><div>In my experience with Fernet, the encryption and decryption isn't as much of a problem as rebuilding all the authorization/authentication context (i.e. running without caching). But this is certainly something to look into, too.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> - appreciate a link on a good doc on how to check and increases memcached caching timers.</div></div></blockquote><div><br></div><div>Are you able to share your memcache/keystone caching configs at all? That might help give us an idea of where to start and how to recreate locally. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div>By the way, I think this topic now should be named "openstack API response times in general", rather than strictly tie to nova-api.</div><div><br></div><div>[1] <a href="https://docs.openstack.org/developer/rally/overview/stories/keystone/authenticate.html" target="_blank">https://docs.openstack.<wbr>org/developer/rally/overview/<wbr>stories/keystone/authenticate.<wbr>html</a></div></div></blockquote><div><br></div><div>[0] <a href="https://github.com/openstack/oslo.cache/blob/master/oslo_cache/_opts.py#L56">https://github.com/openstack/oslo.cache/blob/master/oslo_cache/_opts.py#L56</a></div><div>[1] <a href="https://github.com/openstack/keystone/blob/master/keystone/conf/identity.py#L80-L86">https://github.com/openstack/keystone/blob/master/keystone/conf/identity.py#L80-L86</a></div><div>[2] <a href="https://docs.openstack.org/developer/keystone/configuration.html#cache-configuration-section">https://docs.openstack.org/developer/keystone/configuration.html#cache-configuration-section</a></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">2017-05-05 13:38 GMT+03:00 Vladimir Prokofev <span dir="ltr"><<a href="mailto:v@prokofev.me" target="_blank">v@prokofev.me</a>></span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello Ops.<div><br></div><div>So I had a feeling that my Openstack API is running a bit slow, like it takes a couple of seconds to get a full list of all instances in Horizon for example, which is kind of a lot for a basic MySQL select-and-present job. This got me wondering to how can I measure execution times and find out how long it takes for any part of the API request to execute.</div><div><br></div><div>So I got a first REST client I could find(it happens to be a Google Chrome application called ARC(Advanced REST Client)), decided how and what I want to measure and got to it.</div><div><br></div><div>I'm doing a basic /servers request to public nova-api endpoint using X-Auth-Token I got earlier from keystone. Project that I pull has only single instance in it.</div><div>From clients perspective there're 4 major parts of request: connection time, send time, wait time, and receive time.</div><div>Connection time, send time, and receive time are all negligible, as they take about 5-10ms combined. The major part is wait time, which can take from 130ms to 2500ms.</div><div>Wait time is directly correlated to the execution time that I can see in nova-api log, with about 5-10ms added lag, which is expected and perfectly normal as I understand it.</div><div><br></div><div>First run after some time is always the slowest one. and takes from 900ms to 2500ms. If I do the same request in the next few minutes it will be faster - 130 to 200ms. My guess is that there's some caching involved(memcached?).</div><div><br></div><div>So my questions are:</div><div> - what execution times do you usually see in your environment for nova-api, or other APIs for that matter? I have a feeling that even 130ms is very slow for such a simple task;</div><div> - I assumed that first request is so slow because it pulls data from MySQL which resides on slow storage(3 node Galera cluster, each node is a KVM domain based on 2 SATA 7200 HDD in software RAID1), so I enabled slow query logs for MySQL with 500ms threshold, but I see nothing, even when wait time > 2000ms. Seems MySQL has nothing to do with it;</div><div> - one more idea is that it takes a lot of time for keystone to validate a token. With expired token I get 401 response in 500-600ms first time, and 30-40ms next time I do a request;</div><div> - another idea about slow storage is that nova-api has to write something(a log for example), before sending response, but I don't know how to test this exactly, rather that just transfer my controller nodes on an SSD;</div><div> - is there a way I can profile nova-api, and see exactly which method takes the most time?<br></div><div><br></div><div>I understand that I didn't exactly described my setup. This is due to the fact that I'm more interested in learning how to profile API components, rather than just plainly fix my setup(e.g. make it faster). In the end as I see it it's either a storage related issue(fixed by transferring controllers and database to a fast storage like SSD), or computing power issue(fixed by placing controllers on a dedicated bare-metal with fast CPU/memory). I have a desire not to guess, but to know exactly which is it :)</div><div>I will describe setup in detail if such request is made.</div></div>
</blockquote></div><br></div>
</div></div><br>______________________________<wbr>_________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.<wbr>openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-operators</a><br>
<br></blockquote></div><br></div></div>