[openstack-dev] [Ironic] Fuel agent proposal

Devananda van der Veen devananda.vdv at gmail.com
Tue Dec 9 22:06:33 UTC 2014

On Tue Dec 09 2014 at 9:45:51 AM Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:

> We've been interested in Ironic as a replacement for Cobbler for some of
> our systems and have been kicking the tires a bit recently.
> While initially I thought this thread was probably another "Fuel not
> playing well with the community" kind of thing, I'm not thinking that any
> more. Its deeper then that.

There are aspects to both conversations here, and you raise many valid

Cloud provisioning is great. I really REALLY like it. But one of the things
> that makes it great is the nice, pretty, cute, uniform, standard "hardware"
> the vm gives the user. Ideally, the physical hardware would behave the
> same. But,
> “No Battle Plan Survives Contact With the Enemy”.  The sad reality is,
> most hardware is different from each other. Different drivers, different
> firmware, different different different.

Indeed, hardware is different. And no matter how homogeneous you *think* it
is, at some point, some hardware is going to fail^D^D^Dbehave differently
than some other piece of hardware.

One of the primary goals of Ironic is to provide a common *abstraction* to
all the vendor differences, driver differences, and hardware differences.
There's no magic in that -- underneath the covers, each driver is going to
have to deal with the unpleasant realities of actual hardware that is
actually different.

> One way the cloud enables this isolation is by forcing the cloud admin's
> to install things and deal with the grungy hardware to make the interface
> nice and clean for the user. For example, if you want greater mean time
> between failures of nova compute nodes, you probably use a raid 1. Sure,
> its kind of a pet kind of thing todo, but its up to the cloud admin to
> decide what's "better", buying more hardware, or paying for more admin/user
> time. Extra hard drives are dirt cheep...
> So, in reality Ironic is playing in a space somewhere between "I want to
> use cloud tools to deploy hardware, yay!" and "ewww.., physical hardware's
> nasty. you have to know all these extra things and do all these extra
> things that you don't have to do with a vm"... I believe Ironic's going to
> need to be able to deal with this messiness in as clean a way as possible.

If by "clean" you mean, expose a common abstraction on top of all those
messy differences -- then we're on the same page. I would welcome any
feedback as to where that abstraction leaks today, and on both spec and
code reviews that would degrade or violate that abstraction layer. I think
it is one of, if not *the*, defining characteristic of the project.

> But that's my opinion. If the team feels its not a valid use case, then
> we'll just have to use something else for our needs. I really really want
> to be able to use heat to deploy whole physical distributed systems though.
> Today, we're using software raid over two disks to deploy our nova
> compute. Why? We have some very old disks we recovered for one of our
> clouds and they fail often. nova-compute is pet enough to benefit somewhat
> from being able to swap out a disk without much effort. If we were to use
> Ironic to provision the compute nodes, we need to support a way to do the
> same.

I have made the (apparently incorrect) assumption that anyone running
anything sensitive to disk failures in production would naturally have a
hardware RAID, and that, therefor, Ironic should be capable of setting up
that RAID in accordance with a description in the Nova flavor metadata --
but did not need to be concerned with software RAIDs.

Clearly, there are several folks who have the same use-case in mind, but do
not have hardware RAID cards in their servers, so my initial assumption was
incorrect :)

I'm fairly sure that the IPA team would welcome contributions to this

We're looking into ways of building an image that has a software raid
> presetup, and expand it on boot.

Awesome! I hope that work will make its way into diskimage-builder ;)

(As an aside, I suggested this to the Fuel team back in Atlanta...)

> This requires each image to be customized for this case though. I can see
> Fuel not wanting to provide two different sets of images, "hardware raid"
> and "software raid", that have the same contents in them, with just
> different partitioning layouts... If we want users to not have to care
> about partition layout, this is also not ideal...

End-users are probably not generating their own images for bare metal
(unless user == operator, in which case, it should be fine).

> Assuming Ironic can be convinced that these features really would be
> needed, perhaps the solution is a middle ground between the pxe driver and
> the agent?

I've been rallying for a convergence between the feature sets of these
drivers -- specifically, that the agent should support partition-based
images, and also support copy-over-iscsi as a deployment model. In
parallel, Lucas had started working on splitting the deploy interface into
both boot and deploy, which point we may be able to deprecate the current
family of pxe_* drivers. But I'm birdwalking...

> Associate partition information at the flavor level. The admin can decide
> the best partitioning layout for a given hardware... The user doesn't have
> to care any more. Two flavors for the same hardware could be "4 9's" or "5
> 9's" or something that way.

Bingo. This is the approach we've been discussing over the past two years -
nova flavors could include metadata which get passed down to Ironic and
applied at deploy-time - but it hasn't been as high a priority as other
things. Though not specifically covering partitions, there are specs up for
Nova [0] and Ironic [1] for this workflow.

> Modify the agent to support a pxe style image in addition to full layout,
> and have the agent partition/setup raid and lay down the image into it.
> Modify the agent to support running grub2 at the end of deployment.
Or at least make the agent plugable to support adding these options.
> This does seem a bit backwards from the way the agent has been going. the
> pxe driver was kind of linux specific. the agent is not... So maybe that
> does imply a 3rd driver may be beneficial... But it would be nice to have
> one driver, the agent, in the end that supports everything.

We'll always need different drivers to handle different kinds of hardware.
And we have two modes of deployment today (copy-image-over-iscsi,
agent-downloads-locally) and could have more in the future (bittorrent,
multicast, ...?). That said, I don't know why a single agent couldn't
support multiple modes of deployment.


[0] https://review.openstack.org/#/c/136104/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141209/5435a353/attachment.html>

More information about the OpenStack-dev mailing list