<div dir="ltr"><div>Hi,</div><div>with Ocata upgrade we decided to run local placements (one service per cellV1) because we were nervous about possible scalability issues but specially the increase of the schedule time. Fortunately, this is now been address with the placement-req-filter work.<br></div><div><br></div><div>We started slowly to aggregate our local placements into a the central one (required for cellsV2).</div><div>Currently we have >7000 compute nodes (>40k requests per minute) into this central placement. Still ~2000 compute nodes to go.</div><div><br></div><div>Some lessons so far...</div><div>- Scale keystone accordingly when enabling placement.</div><div>- Don't forget to configure memcache for keystone_authtoken.</div><div>- Change apache mpm default from prefork to event/worker.</div><div>- Increase the WSGI number of processes/threads considering where placement is running.</div><div>- Have enough placement nodes considering your number of requests.</div><div>- Monitor the request time. This impacts VM scheduling. Also, depending how it's configured the LB can also start removing placement nodes.</div><div>- DB could be a bottleneck.</div><div><br></div><div>We are still learning how to have a stable placement at scale. </div><div>It would be great if others can share their experiences.</div><div><br></div><div><br></div><div>Belmiro</div><div>CERN</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 29, 2018 at 10:19 AM, Chris Dent <span dir="ltr"><<a href="mailto:cdent+os@anticdent.org" target="_blank">cdent+os@anticdent.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, 28 Mar 2018, iain MacDonnell wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Looking for recommendations on tuning of nova-placement-api. I have a few moderately-sized deployments (~200 nodes, ~4k instances), currently on Ocata, and instance creation is getting very slow as they fill up.<br>
</blockquote>
<br></span>
This should be well within the capabilities of an appropriately<br>
installed placement service, so I reckon something is weird about<br>
your installation. More within.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
$ time curl <a href="http://apihost:8778/" rel="noreferrer" target="_blank">http://apihost:8778/</a><br>
{"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}<br>
real 0m20.656s<br>
user 0m0.003s<br>
sys 0m0.001s<br>
</blockquote>
<br></span>
This is good choice for trying to determine what's up because it<br>
avoids any interaction with the database and most of the stack of<br>
code: the web server answers, runs a very small percentage of the<br>
placement python stack and kicks out the 401. So this mostly<br>
indicates that socket accept is taking forever.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
nova-placement-api is running under mod_wsgi with the "standard"(?) config, i.e.:<br>
</blockquote>
<br></span>
Do you recall where this configuration comes from? The settings for<br>
WSGIDaemonProcess are not very good and if there is some packaging<br>
or documentation that is settings this way it would be good to find<br>
it and fix it.<br>
<br>
Depending on what else is on the host running placement I'd boost<br>
processes to number of cores divided by 2, 3 or 4 and boost threads to<br>
around 25. Or you can leave 'threads' off and it will default to 15<br>
(at least in recent versions of mod wsgi).<br>
<br>
With the settings a below you're basically saying that you want to<br>
handle 3 connections at a time, which isn't great, since each of<br>
your compute-nodes wants to talk to placement multiple times a<br>
minute (even when nothing is happening).<br>
<br>
Tweaking the number of processes versus the number of threads<br>
depends on whether it appears that the processes are cpu or I/O<br>
bound. More threads helps when things are I/O bound.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
...<br>
WSGIProcessGroup nova-placement-api<br>
WSGIApplicationGroup %{GLOBAL}<br>
WSGIPassAuthorization On<br>
WSGIDaemonProcess nova-placement-api processes=3 threads=1 user=nova group=nova<br>
WSGIScriptAlias / /usr/bin/nova-placement-api<br>
...<br>
</blockquote>
<br></span>
[snip]<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Other suggestions? I'm looking at things like turning off scheduler_tracks_instance_chan<wbr>ges, since affinity scheduling is not needed (at least so-far), but not sure that that will help with placement load (seems like it might, though?)<br>
</blockquote>
<br></span>
This won't impact the placement service itself.<br>
<br>
A while back I did some experiments with trying to overload<br>
placement by using the fake virt driver in devstack and wrote it up<br>
at <a href="https://anticdent.org/placement-scale-fun.html" rel="noreferrer" target="_blank">https://anticdent.org/placemen<wbr>t-scale-fun.html</a><br>
<br>
The gist was that with a properly tuned placement service it was<br>
other parts of the system that suffered first.<span class="HOEnZb"><font color="#888888"><br>
<br>
-- <br>
Chris Dent ٩◔̯◔۶ <a href="https://anticdent.org/" rel="noreferrer" target="_blank">https://anticdent.org/</a><br>
freenode: cdent tw: @anticdent</font></span><br>______________________________<wbr>_________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.<wbr>openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-operators</a><br>
<br></blockquote></div><br></div>