[openstack-dev] [Ironic] Node groups and multi-node operations
Clint Byrum
clint at fewbar.com
Sat Jan 25 02:11:03 UTC 2014
Excerpts from Devananda van der Veen's message of 2014-01-22 16:44:01 -0800:
>
> 1: physical vs. logical grouping
> - Some hardware is logically, but not strictly physically, grouped. Eg, 1U
> servers in the same rack. There is some grouping, such as failure domain,
> but operations on discrete nodes are discreet. This grouping should be
> modeled somewhere, and some times a user may wish to perform an operation
> on that group. Is a higher layer (tuskar, heat, etc) sufficient? I think so.
> - Some hardware _is_ physically grouped. Eg, high-density cartridges which
> share firmware state or a single management end-point, but are otherwise
> discrete computing devices. This grouping must be modeled somewhere, and
> certain operations can not be performed on one member without affecting all
> members. Things will break if each node is treated independently.
>
What Tuskar wants to do is layer workloads on top of logical and physical
groupings. So it would pass to Nova "Boot 4 machines with (flavor)
and distinct(failure_domain_id)"
Now, this is not unique to baremetal. There are plenty of cloud workloads
where one would like anti-affinity and other such things that will span
more than a single compute node. Right now these are at a very coarse
level which is "availability zone". I think it is useful for Nova to
be able to have a list of aspects for each compute node which are not
hierarchical and isolate failure domains which matter for different
work-loads.
And with that, if we simply require at least one instance of nova-compute
running for each set of aspects, Ironic does not have to model this data.
However, in looking at how Ironic works and interacts with Nova, it
doesn't seem like there is any distinction of data per-compute-node
inside Ironic. So for this to work, I'd have to run a whole bunch of
ironic instances, one per compute node. That seems like something we
don't want to do.
So perhaps if ironic can just model _a single_ logical grouping per node,
it can defer any further distinctions up to Nova where it will benefit
all workloads, not just Ironic.
> 2: performance optimization
> - Some operations may be optimized if there is an awareness of concurrent
> identical operations. Eg, deploy the same image to lots of nodes using
> multicast or bittorrent. If Heat were to inform Ironic that this deploy is
> part of a group, the optimization would be deterministic. If Heat does not
> inform Ironic of this grouping, but Ironic infers it (eg, from timing of
> requests for similar actions) then optimization is possible but
> non-deterministic, and may be much harder to reason about or debug.
>
I'm wary of trying to get too deep on optimization this early. There
are some blanket optimizations that you allude to here that I think will
likely work o-k with even the most minimal of clues.
> 3: APIs
> - Higher layers of OpenStack (eg, Heat) are expected to orchestrate
> discrete resource units into a larger group operation. This is where the
> grouping happens today, but already results in inefficiencies when
> performing identical operations at scale. Ironic may be able to get around
> this by coalescing adjacent requests for the same operation, but this would
> be non-deterministic.
Agreed, I think Ironic needs _some_ level of grouping to be efficient.
> - Moving group-awareness or group-operations into the lower layers (eg,
> Ironic) looks like it will require non-trivial changes to Heat and Nova,
> and, in my opinion, violates a layer-constraint that I would like to
> maintain. On the other hand, we could avoid the challenges around
> coalescing. This might be necessary to support physically-grouped hardware
> anyway, too.
>
I actually think that the changes to Heat and Nova are trivial. Nova
needs to have groups for compute nodes and the API needs to accept those
groups. Heat needs to take advantage of them via the API.
There is a non-trivial follow-on which is a "wholistic" scheduler which
would further extend these groups into other physical resources like
networks and block devices. These all feel like logical evolutions of the
idea of making somewhat arbitrary and overlapping groups of compute nodes.
>
> If Ironic coalesces requests, it could be done in either the
> ConductorManager layer or in the drivers themselves. The difference would
> be whether our internal driver API accepts one node or a set of nodes for
> each operation. It'll also impact our locking model. Both of these are
> implementation details that wouldn't affect other projects, but would
> affect our driver developers.
>
> Also, until Ironic models physically-grouped hardware relationships in some
> internal way, we're going to have difficulty supporting that class of
> hardware. Is that OK? What is the impact of not supporting such hardware?
> It seems, at least today, to be pretty minimal.
I don't know much about hardware like that. I think it should just be
another grouping unless it affects the way Ironic talks to the hardware,
at which point it probably belongs in a driver, no?
More information about the OpenStack-dev
mailing list