[openstack-dev] [ironic][nova] Nova midcycle, Ironic perspective
Jim Rollenhagen
jim at jimrollenhagen.com
Mon Jul 25 00:31:44 UTC 2016
Hi all,
The nova midcycle went very well from an ironic perspective. Jay Pipes and
Andrew Laski had a new (better!) proposal for how we schedule ironic resources,
which I'd like to detail for folks. We explicitly split this off from the topic
of running multiple nova-compute daemons for Ironic, because they are separate
topics. But, some of us also discussed that piece and I think we have a good
path forward. We also talked about how we plan Ironic driver work in Nova, and
I'll detail some of that discussion a bit.
Scheduling to Ironic resources
==============================
Jay and Andrew presented their vision for this, which depends heavily on the
work in progress on the placement engine, specifically "resource providers" and
"dynamic resource classes".[0] There's a few distinct pieces to talk about
here. Others that were in the room, please do correct me if I'm wrong
anywhere. :)
First, the general goal for resource providers (which may be a compute host or
something else) is that by the Newton release, information is being reported
into the placement database. This means that when someone upgrades to Ocata,
the data will already be there for the scheduler to use, and there will be no
blip in service while this data is collected.
As a note, a resource provider is a thing that provides some quantitative
amount of a "resource class". This resource class may be things like vCPUs,
RAM, disk, etc. The "dynamic resource class" spec[0] allows resource classes to
be created dynamically via the placement API.
For Ironic, each node will be a resource provider. Each node will provide 0 or
1 of a given resource class, depending if the node is schedulable or not (e.g.
maintenance mode, enroll state, etc).
For the Newton cycle, we want to be putting this resource provider data for
Ironic into the placement database. To do this, the resource tracker will be
modified such that an Ironic node reported back will be put in the
compute_nodes table (as before), and also the resource provider table. Since
each resource provider needs a resource class, Nova needs to be able to find
the resource class in the dict passed back to the resource tracker. As such,
I've proposed a spec to Ironic[1] and some code changes to Ironic,
python-ironicclient, and Nova[2] to pass this information back to the resource
tracker. This is done by putting a field on the node object in Ironic called
`resource_class` (surprise!). I promise I tried to think of a better name for
this and completely failed.
In Ocata, we want to begin scheduling Ironic instances to resource providers.
To accomplish this, Nova flavors will be able to "require" or "prefer" a given
quantity of some resource class. For an Ironic flavor, this will (almost?)
always be a required quantity of 1 of the Ironic resource classes.
Note that we didn't discuss what happens in Ocata, if Ironic nodes don't have a
resource class set and/or flavors do not require some Ironic resource class. I
have some thoughts on this, but they aren't solidified enough to write here
without chatting with the Nova folks to make sure I'm not crazy.
So, between Newton and Ocata, operators will need to set the resource class for
each node in Ironic, and require the resource classes for each Ironic flavor in
Nova.
It's very important we get the work mentioned for Newton done in Newton. If it
doesn't land until Ocata, operators will get a nice surprise when the compute
daemon starts up, and no resources are available until they've populated the
field in Ironic (because it didn't exist in the Newton version of Ironic) and
the resource tracker takes its sweet time picking up that field from the Ironic
nodes.
Also of note: in Ocata, a placement API will be available for Ironic to
talk directly to. This means that when a state changes in Ironic (e.g.
maintenance mode is turned on/off, cleaning->available, etc), we can
immediately tell the placement API that the resource (node) is available
to schedule to (or not). This will help eliminate many of the scheduling races
we have between Nova and Ironic.
[0] https://review.openstack.org/#/c/312696/
[1] https://review.openstack.org/#/c/345040
[2] https://review.openstack.org/#/q/topic:bug/1604916
Multiple compute daemons
========================
This was interesting. Someone (Dan Smith?) proposed doing consistent hashing
of the Ironic nodes between each compute daemon, such that each daemon manages
some subset of the Ironic nodes. This would likely use the same code we already
use in Ironic to decide which conductor manages which nodes (we'd put that
code into oslo).
Once an instance is placed on a compute daemon, the node that instance is on
would always be managed by that daemon, until the instance is deleted. This
is because Nova has strong assumptions that an instance is always managed
by the same compute daemon unless it is migrated. We could write code
to "re-home" an instance to another compute if the hash ring changes, but that
would be down the road a bit.
So, a given compute daemon would manage (nodes with instances managed by
that daemon) + (some subset of nodes decided by the hash ring).
This would mean that we could scale compute daemons horizontally very easily,
and if one fails, automatically re-balance so that no nodes are left behind.
Only existing instances would not be able to be managed (until we wrote some
re-homing code).
I'm going to play with a POC soon - I welcome any help if others want to play
with this as well. :)
I seem to remember this being proposed in the past and being shot down, but
nobody present could remember why. If someone does recall, speak up. We tried
to shoot as many holes in this as possible and couldn't penetrate it.
Planning Ironic virt driver work
================================
I planned to bring this up at some point, but it ended up coming up organically
during a discussion on Neutron and live-migrate. We essentially decided that
when there's a significant amount of Nova changes (for some definition of
"significant"), we should do a couple things:
1) Make sure Nova team buys into the architecture. This could be in the form
of a backlog spec being approved, or even just some +1s from nova-specs
core members on a spec for the current cycle.
2) Wait to approve the Nova side of the work until the Ironic side is done (or
close to done).
This should help ensure that the Nova team can plan accordingly for the work
coming into the Ironic virt driver, without bumping it to the next cycle when
the Ironic side doesn't get finished before the non-priority feature freeze.
As always, questions/comments/concerns on the above is welcome. If there are
none, let's go ahead and get to work on the scheduling bits in the first
section. Thanks for reading my novel.
// jim
More information about the OpenStack-dev
mailing list