[Openstack-operators] Update on Nova scheduler poor performance with Ironic

David Medberry openstack at medberry.net
Tue Aug 30 16:49:32 UTC 2016


Great writeup @Mathieu and thanks @sean and @jrolls!

-d

On Mon, Aug 29, 2016 at 3:34 PM, Mathieu Gagné <mgagne at calavera.ca> wrote:

> Hi,
>
> For those that attended the OpenStack Ops meetup, you probably heard
> me complaining about a serious performance issue we had with Nova
> scheduler (Kilo) with Ironic.
>
> Thanks to Sean Dague and Matt Riedemann, we found the root cause.
>
> It was caused by this block of code [1] which is hitting the database
> for each node loaded by the scheduler. This block of code is called if
> no instance info is found in the scheduler cache.
>
> I found that this instance info is only populated if the
> scheduler_tracks_instance_changes config [2] is enabled which it is by
> default. But being a good operator (wink wink), I followed the Ironic
> install guide which recommends disabling it [3], unknowingly getting
> myself into deep troubles.
>
> There isn't much information about the purpose of this config in the
> kilo branch. Fortunately, you can find more info in the master branch
> [4], thanks to the config documentation effort. This instance info
> cache is used by filters which rely on instance location to perform
> affinity/anti-affinity placement or anything that cares about the
> instances running on the destination node.
>
> Enabling this option will make it so Nova scheduler loads instance
> info asynchronously at start up. Depending on the number of
> hypervisors and instances, it can take several minutes. (we are
> talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in
> our case)
>
> So Jim Roll jumped into the discussion on IRC and found a bug [5] he
> opened and fixed in Liberty. It makes it so Nova scheduler never
> populates the instance info cache if Ironic host manager is loaded.
> For those running Nova with Ironic, you will agree that there is no
> known use case where affinity/anti-affinity is used. (please reply if
> you know of one)
>
> To summarize, the poor performance of Nova scheduler will only show if
> you are running the Kilo version of Nova and you disable
> scheduler_tracks_instance_changes which might be the case if you are
> running Ironic too.
>
> For those curious about our Nova scheduler + Ironic setup, we have
> done the following to get nova scheduler to ludicrous speed:
>
> 1) Use CachingScheduler
>
> There was a great talk at the OpenStack Summit about why you would
> want to use it. [6]
>
> By default, the Nova scheduler will load ALL nodes (hypervisors) from
> database to memory before each scheduling. If you have A LOT of
> hypervisors, this process can take a while. This means scheduling
> won't happen until this step is completed. It could also mean that
> scheduling will always fail if you don't tweak service_down_time (see
> 3 below) if you have lot of hypervisors.
>
> This driver will make it so nodes (hypervisors) are loaded in memory
> every ~60 seconds. Since information is now pre-cached, the scheduling
> process can happen right away, it is super fast.
>
> There is a lot of side-effects to using it though. For example:
> - you can only run ONE nova-scheduler process since cache state won't
> be shared between processes and you don't want instances to be
> scheduled twice to the same node/hypervisor.
> - It can take ~1m before new capacity is recognized by the scheduler.
> (new or freed nodes) The cache is refreshed every 60 seconds with a
> periodic task. (this can be changed with scheduler_driver_task_period)
>
> In the context of Ironic, it is a compromise we are willing to accept.
> We are not adding Ironic nodes that often and nodes aren't
> created/deleting as often as virtual machines.
>
> 2) Run a single nova-compute service
>
> I strongly suggest you DO NOT run multiple nova-compute services. If
> you do, you will have duplicated hypervisors loaded by the scheduler
> and you could end up with conflicting scheduling. You will also have
> twice as much hypervisors to load in the scheduler.
>
> Note: I heard about multiple compute host support in Nova for Ironic
> with use of an hash ring but I don't have much details about it. So
> this recommendation might not apply to you if you are using a recent
> version of Nova.
>
> 3) Increase service_down_time
>
> If you have a lot of nodes, you might have to increase this value
> which is set to 60 seconds by default. This value is used by the
> ComputeFilter filter to exclude nodes it hasn't heard from. If it
> takes more than 60 seconds to list the list of nodes, you might guess
> what we will happen, the scheduler will reject all of them since node
> info is already outdated when it finally hits the filtering steps. I
> strongly suggest you tweak this setting, regardless of the use of
> CachingScheduler.
>
> 4) Tweak scheduler to only load empty nodes/hypervisors
>
> So this is a hack [7] we did before finding out about the bug [5] we
> described and identified earlier. When investigating our performance
> issue, we enabled debug logging and saw that periodic task was taking
> forever to complete (10-15m) with CachingScheduler driver.
>
> We knew (strongly suspected) Nova scheduler was spending a huge amount
> of time loading nodes/hypervisors. We (unfortunately) didn't push
> further our investigation and jumped right away to optimization phase.
>
> So we came up with the idea of only loading empty nodes/hypervisors.
> Remember, we are still in the context of Ironic, not cloud and virtual
> machines. So it made perfect sense for us to stop spending time
> loading nodes/hypervisors we would discard anyway.
>
> Thanks to all that help us debugging our scheduling performance
> issues, it is now crazy fast. =)
>
> [1] https://github.com/openstack/nova/blob/kilo-eol/nova/
> scheduler/host_manager.py#L589-L592
> [2] https://github.com/openstack/nova/blob/kilo-eol/nova/
> scheduler/host_manager.py#L65-L68
> [3] http://docs.openstack.org/developer/ironic/deploy/
> install-guide.html#configure-compute-to-use-the-bare-metal-service
> [4] https://github.com/openstack/nova/blob/282c257aff6b53a1b6bb4b4b034a67
> 0c450d19d8/nova/conf/scheduler.py#L166-L185
> [5] https://bugs.launchpad.net/nova/+bug/1479124
> [6] https://www.youtube.com/watch?v=BcHyiOdme2s
> [7] https://gist.github.com/mgagne/1fbeca4c0b60af73f019bc2e21eb4a80
>
> --
> Mathieu
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160830/47efe2df/attachment.html>


More information about the OpenStack-operators mailing list