[Openstack-operators] Update on Nova scheduler poor performance with Ironic
Matt Riedemann
mriedem at linux.vnet.ibm.com
Wed Aug 31 01:57:07 UTC 2016
On 8/29/2016 4:34 PM, Mathieu Gagné wrote:
> Hi,
>
> For those that attended the OpenStack Ops meetup, you probably heard
> me complaining about a serious performance issue we had with Nova
> scheduler (Kilo) with Ironic.
>
> Thanks to Sean Dague and Matt Riedemann, we found the root cause.
>
> It was caused by this block of code [1] which is hitting the database
> for each node loaded by the scheduler. This block of code is called if
> no instance info is found in the scheduler cache.
>
> I found that this instance info is only populated if the
> scheduler_tracks_instance_changes config [2] is enabled which it is by
> default. But being a good operator (wink wink), I followed the Ironic
> install guide which recommends disabling it [3], unknowingly getting
> myself into deep troubles.
>
> There isn't much information about the purpose of this config in the
> kilo branch. Fortunately, you can find more info in the master branch
> [4], thanks to the config documentation effort. This instance info
> cache is used by filters which rely on instance location to perform
> affinity/anti-affinity placement or anything that cares about the
> instances running on the destination node.
>
> Enabling this option will make it so Nova scheduler loads instance
> info asynchronously at start up. Depending on the number of
> hypervisors and instances, it can take several minutes. (we are
> talking about 10-15 minutes with 600+ Ironic nodes, or ~1s per node in
> our case)
>
> So Jim Roll jumped into the discussion on IRC and found a bug [5] he
> opened and fixed in Liberty. It makes it so Nova scheduler never
> populates the instance info cache if Ironic host manager is loaded.
> For those running Nova with Ironic, you will agree that there is no
> known use case where affinity/anti-affinity is used. (please reply if
> you know of one)
>
> To summarize, the poor performance of Nova scheduler will only show if
> you are running the Kilo version of Nova and you disable
> scheduler_tracks_instance_changes which might be the case if you are
> running Ironic too.
>
> For those curious about our Nova scheduler + Ironic setup, we have
> done the following to get nova scheduler to ludicrous speed:
But have you gone plaid? :)
>
> 1) Use CachingScheduler
>
> There was a great talk at the OpenStack Summit about why you would
> want to use it. [6]
>
> By default, the Nova scheduler will load ALL nodes (hypervisors) from
> database to memory before each scheduling. If you have A LOT of
> hypervisors, this process can take a while. This means scheduling
> won't happen until this step is completed. It could also mean that
> scheduling will always fail if you don't tweak service_down_time (see
> 3 below) if you have lot of hypervisors.
>
> This driver will make it so nodes (hypervisors) are loaded in memory
> every ~60 seconds. Since information is now pre-cached, the scheduling
> process can happen right away, it is super fast.
>
> There is a lot of side-effects to using it though. For example:
> - you can only run ONE nova-scheduler process since cache state won't
> be shared between processes and you don't want instances to be
> scheduled twice to the same node/hypervisor.
> - It can take ~1m before new capacity is recognized by the scheduler.
> (new or freed nodes) The cache is refreshed every 60 seconds with a
> periodic task. (this can be changed with scheduler_driver_task_period)
>
> In the context of Ironic, it is a compromise we are willing to accept.
> We are not adding Ironic nodes that often and nodes aren't
> created/deleting as often as virtual machines.
>
> 2) Run a single nova-compute service
>
> I strongly suggest you DO NOT run multiple nova-compute services. If
> you do, you will have duplicated hypervisors loaded by the scheduler
> and you could end up with conflicting scheduling. You will also have
> twice as much hypervisors to load in the scheduler.
>
> Note: I heard about multiple compute host support in Nova for Ironic
> with use of an hash ring but I don't have much details about it. So
> this recommendation might not apply to you if you are using a recent
> version of Nova.
The spec for the hash ring stuff that landed fairly recently in Newton
is here:
http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/ironic-multiple-compute-hosts.html
It's very early and there isn't CI for it yet, but jroll was saying he
was going to be experimenting with it at Rackspace (or maybe it was with
OSIC...).
>
> 3) Increase service_down_time
>
> If you have a lot of nodes, you might have to increase this value
> which is set to 60 seconds by default. This value is used by the
> ComputeFilter filter to exclude nodes it hasn't heard from. If it
> takes more than 60 seconds to list the list of nodes, you might guess
> what we will happen, the scheduler will reject all of them since node
> info is already outdated when it finally hits the filtering steps. I
> strongly suggest you tweak this setting, regardless of the use of
> CachingScheduler.
>
> 4) Tweak scheduler to only load empty nodes/hypervisors
>
> So this is a hack [7] we did before finding out about the bug [5] we
> described and identified earlier. When investigating our performance
> issue, we enabled debug logging and saw that periodic task was taking
> forever to complete (10-15m) with CachingScheduler driver.
>
> We knew (strongly suspected) Nova scheduler was spending a huge amount
> of time loading nodes/hypervisors. We (unfortunately) didn't push
> further our investigation and jumped right away to optimization phase.
>
> So we came up with the idea of only loading empty nodes/hypervisors.
> Remember, we are still in the context of Ironic, not cloud and virtual
> machines. So it made perfect sense for us to stop spending time
> loading nodes/hypervisors we would discard anyway.
>
> Thanks to all that help us debugging our scheduling performance
> issues, it is now crazy fast. =)
>
> [1] https://github.com/openstack/nova/blob/kilo-eol/nova/scheduler/host_manager.py#L589-L592
> [2] https://github.com/openstack/nova/blob/kilo-eol/nova/scheduler/host_manager.py#L65-L68
> [3] http://docs.openstack.org/developer/ironic/deploy/install-guide.html#configure-compute-to-use-the-bare-metal-service
> [4] https://github.com/openstack/nova/blob/282c257aff6b53a1b6bb4b4b034a670c450d19d8/nova/conf/scheduler.py#L166-L185
> [5] https://bugs.launchpad.net/nova/+bug/1479124
> [6] https://www.youtube.com/watch?v=BcHyiOdme2s
> [7] https://gist.github.com/mgagne/1fbeca4c0b60af73f019bc2e21eb4a80
>
> --
> Mathieu
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
Thanks for the write up, it's always nice to see a follow up from events
back to the mailing list for people that didn't attend.
--
Thanks,
Matt Riedemann
More information about the OpenStack-operators
mailing list