[openstack-dev] [ironic] using ironic as a replacement for existing datacenter baremetal provisioning

Devananda van der Veen devananda.vdv at gmail.com
Tue Jun 7 00:01:04 UTC 2016

On 06/06/2016 01:44 PM, Kris G. Lindgren wrote:
> Hi ironic folks,
> As I'm trying to explore how GoDaddy can use ironic I've created the following
> in an attempt to document some of my concerns, and I'm wondering if you folks
> could help myself identity ongoing work to solve these (or alternatives?)
> List of concerns with ironic:

Hi Kris,

There is a lot of ongoing work in and around the Ironic project. Thanks for
diving in and for sharing your concerns; you're not alone.

I'll respond to each group of concerns, as some of these appear quite similar to
each other and align with stuff we're already doing. Hopefully I can provide
some helpful background to where the project is at today.

> 1.)Nova <-> ironic interactions are generally seem terrible?

These two projects are coming at the task of managing "compute" with
significantly different situations and we've been working, for the last ~2
years, to build a framework that can provide both virtual and physical resources
through one API. It's not a simple task, and we have a lot more to do.

>   -How to accept raid config and partitioning(?) from end users? Seems to not a
> yet agreed upon method between nova/ironic.

Nova expresses partitioning in a very limited way on the flavor. You get root,
swap, and ephemeral partitions -- and that's it. Ironic honors those today, but
they're pinned on the flavor definition, eg. by the cloud admin (or whoever can
define the flavor.

If your users need more complex partitioning, they could create additional
partitions after the instance is created. This limitation within Ironic exists,
in part, because the projects' goal is to provide hardware through the OpenStack
Compute API -- which doesn't express arbitrary partitionability. (If you're
interested, there is a lengthier and more political discussion about whether the
cloud should support "pets" and whether arbitrary partitioning is needed for

RAID configuration isn't something that Nova allows their users to choose today
- it doesn't fit in the Nova model of "compute", and there is, to my knowledge,
nothing in the Nova API to allow its input. We've discussed this a little bit,
but so far settled on leaving it up to the cloud admin to set this in Ironic.

There has been discussion with the Cinder community over ways to express volume
spanning and mirroring, but apply it to a machines' local disks, but these
discussions didn't result in any traction.

There's also been discussion of ways we could do ad-hoc changes in RAID level,
based on flavor metadata, during the provisioning process (rather than ahead of
time) but no code has been done for this yet, AFAIK.

So, where does that leave us? With the "explosion of flavors" that you
described. It may not be ideal, but that is the common ground we've reached.

>    -How to run multiple conductors/nova-computes?   Right now as far as I can
> tell all of ironic front-end by a single nova-compute, which I will have to
> manage via a cluster technology between two or mode nodes.  Because of this and
> the way host-agregates work I am unable to expose fault domains for ironic
> instances (all of ironic can only be under a single AZ (the az that is assigned
> to the nova-compute node)). Unless I create multiple nova-compute servers and
> manage multiple independent ironic setups.  This makes on-boarding/query of
> hardware capacity painful.

Yep. It's not ideal, and the community is very well aware of, and actively
working on, this limitation. It also may not be as bad as you may think. The
nova-compute process doesn't do very much, and tests show it handling some
thousands of ironic nodes fairly well in parallel. Standard active-passive
management of that process should suffice.

A lot of design work has been done to come up with a joint solution by folks on
both the Ironic and Nova teams.

As a side note, it's possible (though not tested, recommended, or well
documented) to run more than one nova-compute. See

>   - Nova appears to be forcing a we are "compute" as long as "compute" is VMs,
> means that we will have a baremetal flavor explosion (ie the mismatch between
> baremetal and VM).
>       - This is a feeling I got from the ironic-nova cross project meeting in
> Austin.  General exmaple goes back to raid config above. I can configure a
> single piece of hardware many different ways, but to fit into nova's world view
> I need to have many different flavors exposed to end-user.  In this way many
> flavors can map back to a single piece of hardware with just a lsightly
> different configuration applied. So how am I suppose to do a single server with
> 6 drives as either: Raid 1 + Raid 5, Raid 5, Raid 10, Raid 6, or JBOD.  Seems
> like I would need to pre-mark out servers that were going to be a specific raid
> level.  Which means that I need to start managing additional sub-pools of
> hardware to just deal with how the end users wants the raid configured, this is
> pretty much a non-starter for us.  I have not really heard of whats being done
> on this specific front.

You're correct. Again, Nova has no concept of RAID in their API, so yea, today
you're left with a 'flavor explosion', as you put it.

There's been discussion of methods we could use to apply the RAID level during
provisioning, but generally those discussions have landed on the side of "it's
the operators responsibility to maintain pools of resources available that match
their customers' demand".

> 2.) Inspector:
>   - IPA service doesn't gather port/switching information

Folks are working on this, but it's been blocked for a while on the
ironic-neutron integration:

>   - Inspection service doesn't process port/switching information, which means
> that it wont add it to ironic.  Which makes managing network swinging of the
> host a non-starter.  As I would inspect the host – then modify the ironic record
> to add the details about what port/switch the server is connected to from a
> different source.  At that point why wouldn't I just onboard everything through
> the API?

This is desired, but not done yet, AFAIK.

>   - Doesn't grab hardware disk configurations, If the server has multiple raids
> (r1 + r5) only reports boot raid disk capacity.

This falls out from a limitation in Nova (discussed above) though I would
encourage inspector to collect all the data (even if ironic/nova can't use it,

>   - Inspection is geared towards using a different network and dnsmasq
> infrastructure than what is in use for ironic/neutron.  Which also means that in
> order to not conflict with dhcp requests for servers in ironic I need to use
> different networks.  Which also means I now need to handle swinging server ports
> between different networks.

Inspector is designed to respond only to requests for nodes in the inspection
phase, so that it *doesn't* conflict with provisioning of nodes by Ironic. I've
been using the same network for inspection and provisioning without issue -- so
I'm not sure what problem you're encountering here.

> 3.) IPA image:
>   - Default build stuff is pinned to extremly old versions due to gate failure
> issues. So I can not work without a fork for onboard of servers due to the fact
> that IPMI modules aren't built for the kernel, so inspection can never match the
> node against ironic.  Seems like currently functionality here is MVP for gate to
> work and to deploy images.  But if you need to do firmware, bios-config, any
> other hardware specific features you are pretty much going to need to roll your
> own IPA image and IPA modules to do standard provisioning tasks.

That's correct. We assume that operators and downstream distributors will build
and customize the IPA image as needed for their environment. Ironic only
provides the base image and the tools to modify it; if we were to attempt to
build an image that could handle every piece of hardware out there, it would be
huge, unwieldy, and contain a lot of proprietary tools that we simply don't have
access / license to use.

> 4.) Conductor:
>   - Serial-over-lan consoles require a unique port on the conductor server (I
> have seen purposes to try and fix this?), this is painful to manage with large
> numbers of servers.
>   - SOL consoles aren't restarted when conductor is restarted (I think this
> might be fixed in newer versions of ironic?), again if end users aren't suppose
> to consume ironic api's directly - this is painful to handle.
>   - As far as I can tell shell-in-a- box, SOL consoles aren't support via nova –
> so how are end users suppose to consume the shell-in-box console?

You are, unfortunately, correct. Ironic once supported SOL console connectivity
through Nova, but it has not been working for a while now. We discussed this at
length in the Austin summit and plan to fix it soon:

>   - Its very easy to get a node to fall off the staemachine rails (reboot a
> server while an image is being deployed to it), the only way I have seen to be
> able to fix this is to update the DB directly.

Yea, that's a well known pain point, and there is ongoing work to improve the
recovery process for nodes that get "stuck" in various ways, with the premise
that the operator should never have to munge the DB directly. One approach we've
discussed is adding a management CLI tool to make this cleaner.

>   - I have BMC that need specific configuration (some require SOL on com2,
> others on com1) this makes it pretty much impossible without per box overrides
> against the conductor hardcoded templates.

Ironic allows certain aspects of the Node's management to be overridden
individually, but it sounds like you need some knobs that we haven't
implemented. Could you file a bug for this? I think we'd be keen to add it.

>   - Additionally it would be nice to default to having a provisioning
> kernel/image that was set as a single config option with per server overrides –
> rather than on each server.  If we ever change the IPA image – that means at
> scale we would need to update thousands of ironic nodes.

This request has surfaced in the past, however, it wouldn't make sense in a
heterogeneous environment (eg, mix of ia64 and x86_64 hardware in one region)
and so past discussions have landed on the side of not implementing it (either
as a system-level default image or as a driver-level default image).

If there were a consensus that it helped enough deployments, without increasing
the complexity of complex multi-arch deployments, I think folks would be willing
to accept a feature like this.

> What is ironic doing to monitor the hardware for failures?  I assume the answer
> here is nothing and that we will need to make sure the images that we deploy are
> correctly configuring the tools to monitor disk/health/psu's/ram errors, ect. ect.

Today, nothing, but this is something we want to do, and there is an agreed-upon
design, here:

This is only the initial design for the notification system. The goal of this
would be to enable drivers to capture hardware alerts (or perform more proactive
gathering of hardware status) and propagate those alerts up to the cloud operator.

In summary, you're not alone, nor are your ideas/thoughts/requests unreasonable.
We're all facing similar concerns -- and you're welcome to come hang out in
#openstack-ironic and participate in shaping Ironic so that it meets your needs,
too :)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160606/69364798/attachment.pgp>

More information about the OpenStack-dev mailing list