[openstack-dev] [Ironic] Fuel agent proposal

Vladimir Kozhukalov vkozhukalov at mirantis.com
Wed Dec 10 16:53:07 UTC 2014


Devananda,

Thank you for such a constructive letter,

First of all, just to make sure we are on the same page, we are totally +1
for using any tool which meets our requirements and we are totally +1 for
working together on the same problems. As you remember we suggested to add
advanced partition capabilities (md and lvm) into IPA. I see that it is
layer violation for Ironic, I see it is not in cloud scope,  but we need
these features, because our users want them and because our use case is
deployment. For me it is seems OK when some tool has some feature which is
not mandatory to use.

And we didn't start Fuel Agent until these features were rejected to be
merged into IPA. If we had a chance to implement them in terms of IPA that
would be a preferred way for us.

Some details:

* Power management

For power management Cobbler uses so called 'fence agents'. It is just a OS
package which provides a bunch of scripts using ILO, IPMI, DRAC clients.
Currently we extended this set of agents with so called 'ssh' agent. This
ssh agent is able to run 'reboot' command inside OS via ssh. We use this
agent by default because many of our users do their experiments on BMC-free
hardware. That is why this spec
https://review.openstack.org/#/c/138115/ refers to SSH power driver.

I know Ironic already has SSH power driver which runs 'virsh' (a little bit
confusing) command via ssh and it is supposed to use it for experimental
envs. The suggestion to implement another SSH power driver can confuse
people. My suggestion is to extend Ironic SSH power driver so as to make it
able to run any command (virsh, vbox or even reboot) from a set. And maybe
renaming this driver into something like 'experimental' or 'development' is
not a very bad idea. I am aware that Ironic wants to remove this driver at
all as it is used for tests only. But there are lots of different power
cases (including w/o BMC) hardware and we definitely need to have a place
where to put this non standard power related stuff. I believe many people
are interested in having such a workaround.

And we certainly need other Ironic power management capabilities like ILO,
DRAC, IPMI. We are also potentially very interested in developing other
hardware management capabilities like configuring hardware RAIDs,
BIOS/UEFI, etc.

* DHCP, TFTP, DNS management

We are aware of the way how Ironic manages DHCP (not directly). As you
mentioned,  currently Ironic has a pluggable framework for DHCP and the
only in-tree driver is neutron. And we are aware that implementing kind of
dnsmasq wrapper immediately breaks Ironic scaling scheme (many conductors).
When I wrote 'it is planned to implement dnsmasq plugin' in this spec
https://review.openstack.org/#/c/138301 I didn't mean Ironic is planning to
do this. I meant Fuel team is planning to implement this dnsmasq plugin out
of Ironic tree (will point it out explicitly) just to be able to fit Fuel
release cycle (iterative development). Maybe in the future we will consider
to switch to Neutron for managing networks (out of scope of this
discussion). This Ironic Fuel Agent driver is supposed to use Ironic
abstractions to configure DHCP, i.e. call plugin methods
 update_port_dhcp_opts, update_port_address,
update_dhcp_opts, get_ip_addresses NOT changing Ironic core (again, we will
point it out explicitly).

* IPA vs. Fuel Agent

My suggestion here is to stop think of Fuel Agent as Fuel only related
stuff. I hope it is clear by now that Fuel Agent is just a generic tool
which is about 'operator == user within traditional IT shop' use case. And
this use case requires all that stuff like LVM and enormous flexibility
which does not even have a chance to be considered as a part of IPA next
few months. A good decision here might be implementing Fuel Agent driver
and then working on distinguishing common IPA and Fuel Agent parts and
putting them into one tree (long term perspective). If it is a big deal we
can even rename Fuel Agent into something which sounds more neutrally (not
related to Fuel) and put it into a separate git.

If this is what FuelAgent is about, why is there so much resistance to
> contributing that functionality to the component which is already
> integrated with Ironic? Why complicate matters for both users and
> developers by adding *another* deploy agent that does (or will soon do) the
> same things?


Briefly, we are glad to contribute to IPA but let's do things iteratively.
I need somehow to deliver power and dhcp management + image based
provisioning by March 2015. According to my previous experience of
contributing to IPA it is almost impossible to merge everything I need by
that time. It is possible to implement Fuel Agent driver by that time. It
is also possible to implement something on my own not integrating Ironic
into Fuel at all. As a long term perspective if it's OK to land MD and LVM
into IPA we definitely can do that.

In summary, if I understand correctly, it seems as though you're trying to
> fit Ironic into Cobbler's way of doing things, rather than recognize that
> Ironic approaches provisioning in a fundamentally different way.


You are only partly correct here. We are trying to fit Ironic into part of
Cobbler's way of doing things. We are planning to get rid of native OS
installers (anaconda) and therefore get rid of Cobbler and switch to image
based model. We have some parts in Fuel which were designed to fit
Cobbler's way of doing things and we can not throw them away at once. Our
delivery cycle enforces us to do things iteratively. That is the reason of
having dnsmasq wrapper, for example, as a temporary scheme. Fuel is
continuously evolving to address modern tendencies.

Your use case:
> * is not cloud-like
>

Exactly.

* does not include Nova or Neutron, but will duplicate functionality of
> both (you need a scheduler and all the logic within nova.virt.ironic, and
> something to manage DHCP and DNS assignment)
>

Currently Fuel user plays a role of very smart Nova scheduler (user looks
at node parameters and chooses which of nodes are suitable) and yes, Fuel
partly implements some logic similar to that in nova.virt.ironic.


> * would use Ironic to manage diverse hardware, which naturally requires
> some operator-driven customization, but still exposes the messy
> configuration bits^D^Dchoices to users at deploy time
>

Exactly.

* duplicates some of the functionality already available in other drivers


Maybe, if you mean IPA driver.


There are certain aspects of the proposal which I like, though:
> * using SSH rather than HTTP for remote access to the deploy agent
>
yes.

> * support for putting the root partition on a software RAID
>
not only MD, root fs over LVM is also in our scope.

> * integration with another provisioning system, without any API changes

yes.

Besides, we install grub and linux kernel on a hard drive (local boot).

Why does Astute need to pass this to Ironic? It seems like Astute could
> simply SSH to the node running FuelAgent at this point, without Ironic
> being involved


That is exactly what it does right now. But as far as we need power
management (IPMI ILO etc) and Ironic implements that, as far as Ironic is
going to implement other hardware management capabilities, it sounds
rational to use it for our use case. Why we are interested in using
OpenStack services (definitely Ironic, maybe Neutron, maybe Glance, already
Keystone)? Answer is simple: because some of those services address our
needs. No conspiracy theory. Totally pragmatic.


Vladimir Kozhukalov

On Wed, Dec 10, 2014 at 1:06 AM, Devananda van der Veen <
devananda.vdv at gmail.com> wrote:

> On Tue Dec 09 2014 at 9:45:51 AM Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:
>
>> We've been interested in Ironic as a replacement for Cobbler for some of
>> our systems and have been kicking the tires a bit recently.
>>
>> While initially I thought this thread was probably another "Fuel not
>> playing well with the community" kind of thing, I'm not thinking that any
>> more. Its deeper then that.
>>
>
> There are aspects to both conversations here, and you raise many valid
> points.
>
> Cloud provisioning is great. I really REALLY like it. But one of the
>> things that makes it great is the nice, pretty, cute, uniform, standard
>> "hardware" the vm gives the user. Ideally, the physical hardware would
>> behave the same. But,
>> “No Battle Plan Survives Contact With the Enemy”.  The sad reality is,
>> most hardware is different from each other. Different drivers, different
>> firmware, different different different.
>>
>
> Indeed, hardware is different. And no matter how homogeneous you *think*
> it is, at some point, some hardware is going to fail^D^D^Dbehave
> differently than some other piece of hardware.
>
> One of the primary goals of Ironic is to provide a common *abstraction* to
> all the vendor differences, driver differences, and hardware differences.
> There's no magic in that -- underneath the covers, each driver is going to
> have to deal with the unpleasant realities of actual hardware that is
> actually different.
>
>
>> One way the cloud enables this isolation is by forcing the cloud admin's
>> to install things and deal with the grungy hardware to make the interface
>> nice and clean for the user. For example, if you want greater mean time
>> between failures of nova compute nodes, you probably use a raid 1. Sure,
>> its kind of a pet kind of thing todo, but its up to the cloud admin to
>> decide what's "better", buying more hardware, or paying for more admin/user
>> time. Extra hard drives are dirt cheep...
>>
>> So, in reality Ironic is playing in a space somewhere between "I want to
>> use cloud tools to deploy hardware, yay!" and "ewww.., physical hardware's
>> nasty. you have to know all these extra things and do all these extra
>> things that you don't have to do with a vm"... I believe Ironic's going to
>> need to be able to deal with this messiness in as clean a way as possible.
>
>
> If by "clean" you mean, expose a common abstraction on top of all those
> messy differences -- then we're on the same page. I would welcome any
> feedback as to where that abstraction leaks today, and on both spec and
> code reviews that would degrade or violate that abstraction layer. I think
> it is one of, if not *the*, defining characteristic of the project.
>
>
>> But that's my opinion. If the team feels its not a valid use case, then
>> we'll just have to use something else for our needs. I really really want
>> to be able to use heat to deploy whole physical distributed systems though.
>>
>> Today, we're using software raid over two disks to deploy our nova
>> compute. Why? We have some very old disks we recovered for one of our
>> clouds and they fail often. nova-compute is pet enough to benefit somewhat
>> from being able to swap out a disk without much effort. If we were to use
>> Ironic to provision the compute nodes, we need to support a way to do the
>> same.
>>
>
> I have made the (apparently incorrect) assumption that anyone running
> anything sensitive to disk failures in production would naturally have a
> hardware RAID, and that, therefor, Ironic should be capable of setting up
> that RAID in accordance with a description in the Nova flavor metadata --
> but did not need to be concerned with software RAIDs.
>
> Clearly, there are several folks who have the same use-case in mind, but
> do not have hardware RAID cards in their servers, so my initial assumption
> was incorrect :)
>
> I'm fairly sure that the IPA team would welcome contributions to this
> effect.
>
> We're looking into ways of building an image that has a software raid
>> presetup, and expand it on boot.
>
>
> Awesome! I hope that work will make its way into diskimage-builder ;)
>
> (As an aside, I suggested this to the Fuel team back in Atlanta...)
>
>
>> This requires each image to be customized for this case though. I can see
>> Fuel not wanting to provide two different sets of images, "hardware raid"
>> and "software raid", that have the same contents in them, with just
>> different partitioning layouts... If we want users to not have to care
>> about partition layout, this is also not ideal...
>>
>
> End-users are probably not generating their own images for bare metal
> (unless user == operator, in which case, it should be fine).
>
>
>> Assuming Ironic can be convinced that these features really would be
>> needed, perhaps the solution is a middle ground between the pxe driver and
>> the agent?
>>
>
> I've been rallying for a convergence between the feature sets of these
> drivers -- specifically, that the agent should support partition-based
> images, and also support copy-over-iscsi as a deployment model. In
> parallel, Lucas had started working on splitting the deploy interface into
> both boot and deploy, which point we may be able to deprecate the current
> family of pxe_* drivers. But I'm birdwalking...
>
>
>> Associate partition information at the flavor level. The admin can decide
>> the best partitioning layout for a given hardware... The user doesn't have
>> to care any more. Two flavors for the same hardware could be "4 9's" or "5
>> 9's" or something that way.
>>
>
> Bingo. This is the approach we've been discussing over the past two years
> - nova flavors could include metadata which get passed down to Ironic and
> applied at deploy-time - but it hasn't been as high a priority as other
> things. Though not specifically covering partitions, there are specs up for
> Nova [0] and Ironic [1] for this workflow.
>
>
>> Modify the agent to support a pxe style image in addition to full layout,
>> and have the agent partition/setup raid and lay down the image into it.
>> Modify the agent to support running grub2 at the end of deployment.
>>
> Or at least make the agent plugable to support adding these options.
>>
>> This does seem a bit backwards from the way the agent has been going. the
>> pxe driver was kind of linux specific. the agent is not... So maybe that
>> does imply a 3rd driver may be beneficial... But it would be nice to have
>> one driver, the agent, in the end that supports everything.
>>
>
> We'll always need different drivers to handle different kinds of hardware.
> And we have two modes of deployment today (copy-image-over-iscsi,
> agent-downloads-locally) and could have more in the future (bittorrent,
> multicast, ...?). That said, I don't know why a single agent couldn't
> support multiple modes of deployment.
>
>
> -Devananda
>
>
> [0] https://review.openstack.org/#/c/136104/
>
> [1]
>
> http://specs.openstack.org/openstack/ironic-specs/specs/backlog/driver-capabilities.html
> and
> https://review.openstack.org/#/c/137363/
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141210/0b151c7a/attachment.html>


More information about the OpenStack-dev mailing list