[openstack-dev] [Ironic] [Nova] continuing the "multiple compute host" discussion

Devananda van der Veen devananda.vdv at gmail.com
Thu Dec 10 23:57:59 UTC 2015


I'm going to attempt to summarize a discussion that's been going on for
over a year now, and still remains unresolved.


The main touch-point between Nova and Ironic continues to be a pain point,
and despite many discussions between the teams over the last year resulting
in a solid proposal, we have not been able to get consensus on a solution
that meets everyone's needs.

Some folks are asking us to implement a non-virtualization-centric
scheduler / resource tracker in Nova, or advocating that we wait for the
Nova scheduler to be split-out into a separate project. I do not believe
the Nova team is interested in the former, I do not want to wait for the
latter, and I do not believe that either one will be an adequate solution
-- there are other clients (besides Nova) that need to schedule workloads
on Ironic.

We need to decide on a path of least pain and then proceed. I really want
to get this done in Mitaka.

Long version:

During Liberty, Jim and I worked with Jay Pipes and others on the Nova team
to come up with a plan. That plan was proposed in a Nova spec [1] and
approved in October, shortly before the Mitaka summit. It got significant
reviews from the Ironic team, since it is predicated on work being done in
Ironic to expose a new "reservations" API endpoint. The details of that
Ironic change were proposed separately [2] but have deadlocked. Discussions
with some operators at and after the Mitaka summit have highlighted a
problem with this plan.

Actually, more than one, so to better understand the divergent viewpoints
that result in the current deadlock, I drew a diagram [3]. If you haven't
read both the Nova and Ironic specs already, this diagram probably won't
make sense to you. I'll attempt to explain it a bit with more words.

The Nova team wants to remove the (Host, Node) tuple from all the places
that this exists, and return to scheduling only based on Compute Host. They
also don't want to change any existing scheduler filters (especially not
compute_capabilities_filter) or the filter scheduler class or plugin
mechanisms. And, as far as I understand it, they're not interested in
accepting a filter plugin that calls out to external APIs (eg, Ironic) to
identify a Node and pass that Node's UUID to the Compute Host.  [[ nova
team: please correct me on any point here where I'm wrong, or your
collective views have changed over the last year. ]]

OpenStack deployers who are using Nova + Ironic rely on a few things:
- compute_capabilities_filter to match node.properties['capabilities']
against flavor extra_specs.
- other downstream nova scheduler filters that do other sorts of hardware
These deployers clearly and rightly do not want us to take away either of
these capabilities, so anything we do needs to be backwards compatible with
any current Nova scheduler plugins -- even downstream ones.

[C] To meet the compatibility requirements of [B] without requiring the
nova-scheduler team to do the work, we would need to forklift some parts of
the nova-scheduler code into Ironic. But I think that's terrible, and I
don't think any OpenStack developers will like it. Furthermore, operators
have already expressed their distase for this because they want to use the
same filters for virtual and baremetal instances but do not want to
duplicate the code (because we all know that's a recipe for drift).

What ever solution we devise for scheduling bare metal resources in Ironic
needs to perform well at the scale Ironic deployments are aiming for (eg,
thousands of Nodes) without the use of Cells. It also must be integrable
with other software (eg, it should be exposed in our REST API). And it must
allow us to run more than one (active-active) nova-compute process, which
we can't today.

OK. That's a lot of words... bear with me, though, as I'm not done yet...

This drawing [3] is a Venn diagram, but not everything overlaps. The Nova
and Ironic specs [0],[1] meet the needs of the Nova team and the Ironic
team, and will provide a more performant, highly-available solution, that
is easier to use with other schedulers or datacenter-management tools.
However, this solution does not meet the needs of some current OpenStack
Operators because it will not support Nova Scheduler filter plugins. Thus,
in the diagram, [A] and [D] overlap but neither one intersects with [B].


We have proposed a solution that fits ironic's HA model into nova-compute's
failure domain model, but that's only half of the picture -- in so doing,
we assumed that scheduling of bare metal resources was simplistic when, in
fact, it needs to be just as rich as the scheduling of virtual resources.

So, at this point, I think we need to accept that the scheduling of
virtualized and bare metal workloads are two different problem domains that
are equally complex.

Either, we:
* build a separate scheduler process in Ironic, forking the Nova scheduler
as a starting point so as to be compatible with existing plugins; or
* begin building a direct integration between nova-scheduler and ironic,
and create a non-virtualization-centric resource tracker within Nova; or
* proceed with the plan we previously outlined, accept that this isn't
going to be backwards compatible with nova filter plugins, and apologize to
any operators who rely on the using the same scheduler plugins for
baremetal and virtual resources; or
* keep punting on this, bringing pain and suffering to all operators of
bare metal clouds, because nova-compute must be run as exactly one process
for all sizes of clouds.

Thanks for reading,

[0] Yes, there are some hacks to work around this, but they are bad. Please
don't encourage their use.

[1] https://review.openstack.org/#/c/194453/

[2] https://review.openstack.org/#/c/204641/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151210/2504c2f7/attachment.html>

More information about the OpenStack-dev mailing list