<div dir="ltr">Great writeup @Mathieu and thanks @sean and @jrolls!<div><br></div><div>-d</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 29, 2016 at 3:34 PM, Mathieu Gagné <span dir="ltr"><<a href="mailto:mgagne@calavera.ca" target="_blank">mgagne@calavera.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>

<br>

For those that attended the OpenStack Ops meetup, you probably heard<br>

me complaining about a serious performance issue we had with Nova<br>

scheduler (Kilo) with Ironic.<br>

<br>

Thanks to Sean Dague and Matt Riedemann, we found the root cause.<br>

<br>

It was caused by this block of code [1] which is hitting the database<br>

for each node loaded by the scheduler. This block of code is called if<br>

no instance info is found in the scheduler cache.<br>

<br>

I found that this instance info is only populated if the<br>

scheduler_tracks_instance_<wbr>changes config [2] is enabled which it is by<br>

default. But being a good operator (wink wink), I followed the Ironic<br>

install guide which recommends disabling it [3], unknowingly getting<br>

myself into deep troubles.<br>

<br>

There isn't much information about the purpose of this config in the<br>

kilo branch. Fortunately, you can find more info in the master branch<br>

[4], thanks to the config documentation effort. This instance info<br>

cache is used by filters which rely on instance location to perform<br>

affinity/anti-affinity placement or anything that cares about the<br>

instances running on the destination node.<br>

<br>

Enabling this option will make it so Nova scheduler loads instance<br>

info asynchronously at start up. Depending on the number of<br>

hypervisors and instances, it can take several minutes. (we are<br>

talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in<br>

our case)<br>

<br>

So Jim Roll jumped into the discussion on IRC and found a bug [5] he<br>

opened and fixed in Liberty. It makes it so Nova scheduler never<br>

populates the instance info cache if Ironic host manager is loaded.<br>

For those running Nova with Ironic, you will agree that there is no<br>

known use case where affinity/anti-affinity is used. (please reply if<br>

you know of one)<br>

<br>

To summarize, the poor performance of Nova scheduler will only show if<br>

you are running the Kilo version of Nova and you disable<br>

scheduler_tracks_instance_<wbr>changes which might be the case if you are<br>

running Ironic too.<br>

<br>

For those curious about our Nova scheduler + Ironic setup, we have<br>

done the following to get nova scheduler to ludicrous speed:<br>

<br>

1) Use CachingScheduler<br>

<br>

There was a great talk at the OpenStack Summit about why you would<br>

want to use it. [6]<br>

<br>

By default, the Nova scheduler will load ALL nodes (hypervisors) from<br>

database to memory before each scheduling. If you have A LOT of<br>

hypervisors, this process can take a while. This means scheduling<br>

won't happen until this step is completed. It could also mean that<br>

scheduling will always fail if you don't tweak service_down_time (see<br>

3 below) if you have lot of hypervisors.<br>

<br>

This driver will make it so nodes (hypervisors) are loaded in memory<br>

every ~60 seconds. Since information is now pre-cached, the scheduling<br>

process can happen right away, it is super fast.<br>

<br>

There is a lot of side-effects to using it though. For example:<br>

- you can only run ONE nova-scheduler process since cache state won't<br>

be shared between processes and you don't want instances to be<br>

scheduled twice to the same node/hypervisor.<br>

- It can take ~1m before new capacity is recognized by the scheduler.<br>

(new or freed nodes) The cache is refreshed every 60 seconds with a<br>

periodic task. (this can be changed with scheduler_driver_task_period)<br>

<br>

In the context of Ironic, it is a compromise we are willing to accept.<br>

We are not adding Ironic nodes that often and nodes aren't<br>

created/deleting as often as virtual machines.<br>

<br>

2) Run a single nova-compute service<br>

<br>

I strongly suggest you DO NOT run multiple nova-compute services. If<br>

you do, you will have duplicated hypervisors loaded by the scheduler<br>

and you could end up with conflicting scheduling. You will also have<br>

twice as much hypervisors to load in the scheduler.<br>

<br>

Note: I heard about multiple compute host support in Nova for Ironic<br>

with use of an hash ring but I don't have much details about it. So<br>

this recommendation might not apply to you if you are using a recent<br>

version of Nova.<br>

<br>

3) Increase service_down_time<br>

<br>

If you have a lot of nodes, you might have to increase this value<br>

which is set to 60 seconds by default. This value is used by the<br>

ComputeFilter filter to exclude nodes it hasn't heard from. If it<br>

takes more than 60 seconds to list the list of nodes, you might guess<br>

what we will happen, the scheduler will reject all of them since node<br>

info is already outdated when it finally hits the filtering steps. I<br>

strongly suggest you tweak this setting, regardless of the use of<br>

CachingScheduler.<br>

<br>

4) Tweak scheduler to only load empty nodes/hypervisors<br>

<br>

So this is a hack [7] we did before finding out about the bug [5] we<br>

described and identified earlier. When investigating our performance<br>

issue, we enabled debug logging and saw that periodic task was taking<br>

forever to complete (10-15m) with CachingScheduler driver.<br>

<br>

We knew (strongly suspected) Nova scheduler was spending a huge amount<br>

of time loading nodes/hypervisors. We (unfortunately) didn't push<br>

further our investigation and jumped right away to optimization phase.<br>

<br>

So we came up with the idea of only loading empty nodes/hypervisors.<br>

Remember, we are still in the context of Ironic, not cloud and virtual<br>

machines. So it made perfect sense for us to stop spending time<br>

loading nodes/hypervisors we would discard anyway.<br>

<br>

Thanks to all that help us debugging our scheduling performance<br>

issues, it is now crazy fast. =)<br>

<br>

[1] <a href="https://github.com/openstack/nova/blob/kilo-eol/nova/scheduler/host_manager.py#L589-L592" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/kilo-eol/nova/<wbr>scheduler/host_manager.py#<wbr>L589-L592</a><br>

[2] <a href="https://github.com/openstack/nova/blob/kilo-eol/nova/scheduler/host_manager.py#L65-L68" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/kilo-eol/nova/<wbr>scheduler/host_manager.py#L65-<wbr>L68</a><br>

[3] <a href="http://docs.openstack.org/developer/ironic/deploy/install-guide.html#configure-compute-to-use-the-bare-metal-service" rel="noreferrer" target="_blank">http://docs.openstack.org/<wbr>developer/ironic/deploy/<wbr>install-guide.html#configure-<wbr>compute-to-use-the-bare-metal-<wbr>service</a><br>

[4] <a href="https://github.com/openstack/nova/blob/282c257aff6b53a1b6bb4b4b034a670c450d19d8/nova/conf/scheduler.py#L166-L185" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/<wbr>282c257aff6b53a1b6bb4b4b034a67<wbr>0c450d19d8/nova/conf/<wbr>scheduler.py#L166-L185</a><br>

[5] <a href="https://bugs.launchpad.net/nova/+bug/1479124" rel="noreferrer" target="_blank">https://bugs.launchpad.net/<wbr>nova/+bug/1479124</a><br>

[6] <a href="https://www.youtube.com/watch?v=BcHyiOdme2s" rel="noreferrer" target="_blank">https://www.youtube.com/watch?<wbr>v=BcHyiOdme2s</a><br>

[7] <a href="https://gist.github.com/mgagne/1fbeca4c0b60af73f019bc2e21eb4a80" rel="noreferrer" target="_blank">https://gist.github.com/<wbr>mgagne/<wbr>1fbeca4c0b60af73f019bc2e21eb4a<wbr>80</a><br>

<br>

--<br>

Mathieu<br>

<br>

______________________________<wbr>_________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.<wbr>openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-operators</a><br>

</blockquote></div><br></div>