[openstack-dev] [ironic] using ironic as a replacement for existing datacenter baremetal provisioning
    Dmitry Tantsur 
    dtantsur at redhat.com
       
    Tue Jun  7 15:39:06 UTC 2016
    
    
  
On 06/07/2016 02:01 AM, Devananda van der Veen wrote:
>
> On 06/06/2016 01:44 PM, Kris G. Lindgren wrote:
>> Hi ironic folks,
>> As I'm trying to explore how GoDaddy can use ironic I've created the following
>> in an attempt to document some of my concerns, and I'm wondering if you folks
>> could help myself identity ongoing work to solve these (or alternatives?)
>> List of concerns with ironic:
>
> Hi Kris,
>
> There is a lot of ongoing work in and around the Ironic project. Thanks for
> diving in and for sharing your concerns; you're not alone.
>
> I'll respond to each group of concerns, as some of these appear quite similar to
> each other and align with stuff we're already doing. Hopefully I can provide
> some helpful background to where the project is at today.
>
>>
>> 1.)Nova <-> ironic interactions are generally seem terrible?
>
> These two projects are coming at the task of managing "compute" with
> significantly different situations and we've been working, for the last ~2
> years, to build a framework that can provide both virtual and physical resources
> through one API. It's not a simple task, and we have a lot more to do.
>
>
>>   -How to accept raid config and partitioning(?) from end users? Seems to not a
>> yet agreed upon method between nova/ironic.
>
> Nova expresses partitioning in a very limited way on the flavor. You get root,
> swap, and ephemeral partitions -- and that's it. Ironic honors those today, but
> they're pinned on the flavor definition, eg. by the cloud admin (or whoever can
> define the flavor.
>
> If your users need more complex partitioning, they could create additional
> partitions after the instance is created. This limitation within Ironic exists,
> in part, because the projects' goal is to provide hardware through the OpenStack
> Compute API -- which doesn't express arbitrary partitionability. (If you're
> interested, there is a lengthier and more political discussion about whether the
> cloud should support "pets" and whether arbitrary partitioning is needed for
> "cattle".)
>
>
> RAID configuration isn't something that Nova allows their users to choose today
> - it doesn't fit in the Nova model of "compute", and there is, to my knowledge,
> nothing in the Nova API to allow its input. We've discussed this a little bit,
> but so far settled on leaving it up to the cloud admin to set this in Ironic.
>
> There has been discussion with the Cinder community over ways to express volume
> spanning and mirroring, but apply it to a machines' local disks, but these
> discussions didn't result in any traction.
>
> There's also been discussion of ways we could do ad-hoc changes in RAID level,
> based on flavor metadata, during the provisioning process (rather than ahead of
> time) but no code has been done for this yet, AFAIK.
I'm still pretty interested in it, because I agree with anything said 
above about building RAID ahead-of-time not being convenient. I don't 
quite understand how such a feature would look like, we might add it as 
a topic for midcycle.
>
> So, where does that leave us? With the "explosion of flavors" that you
> described. It may not be ideal, but that is the common ground we've reached.
>
>>    -How to run multiple conductors/nova-computes?   Right now as far as I can
>> tell all of ironic front-end by a single nova-compute, which I will have to
>> manage via a cluster technology between two or mode nodes.  Because of this and
>> the way host-agregates work I am unable to expose fault domains for ironic
>> instances (all of ironic can only be under a single AZ (the az that is assigned
>> to the nova-compute node)). Unless I create multiple nova-compute servers and
>> manage multiple independent ironic setups.  This makes on-boarding/query of
>> hardware capacity painful.
>
> Yep. It's not ideal, and the community is very well aware of, and actively
> working on, this limitation. It also may not be as bad as you may think. The
> nova-compute process doesn't do very much, and tests show it handling some
> thousands of ironic nodes fairly well in parallel. Standard active-passive
> management of that process should suffice.
>
> A lot of design work has been done to come up with a joint solution by folks on
> both the Ironic and Nova teams.
> http://specs.openstack.org/openstack/nova-specs/specs/newton/approved/ironic-multiple-compute-hosts.html
>
> As a side note, it's possible (though not tested, recommended, or well
> documented) to run more than one nova-compute. See
> https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py
>
>>   - Nova appears to be forcing a we are "compute" as long as "compute" is VMs,
>> means that we will have a baremetal flavor explosion (ie the mismatch between
>> baremetal and VM).
>>       - This is a feeling I got from the ironic-nova cross project meeting in
>> Austin.  General exmaple goes back to raid config above. I can configure a
>> single piece of hardware many different ways, but to fit into nova's world view
>> I need to have many different flavors exposed to end-user.  In this way many
>> flavors can map back to a single piece of hardware with just a lsightly
>> different configuration applied. So how am I suppose to do a single server with
>> 6 drives as either: Raid 1 + Raid 5, Raid 5, Raid 10, Raid 6, or JBOD.  Seems
>> like I would need to pre-mark out servers that were going to be a specific raid
>> level.  Which means that I need to start managing additional sub-pools of
>> hardware to just deal with how the end users wants the raid configured, this is
>> pretty much a non-starter for us.  I have not really heard of whats being done
>> on this specific front.
>>
>
> You're correct. Again, Nova has no concept of RAID in their API, so yea, today
> you're left with a 'flavor explosion', as you put it.
>
> There's been discussion of methods we could use to apply the RAID level during
> provisioning, but generally those discussions have landed on the side of "it's
> the operators responsibility to maintain pools of resources available that match
> their customers' demand".
>
>
>> 2.) Inspector:
>>   - IPA service doesn't gather port/switching information
>
> Folks are working on this, but it's been blocked for a while on the
> ironic-neutron integration:
> https://review.openstack.org/#/c/241242/
>
>>   - Inspection service doesn't process port/switching information, which means
>> that it wont add it to ironic.  Which makes managing network swinging of the
>> host a non-starter.  As I would inspect the host – then modify the ironic record
>> to add the details about what port/switch the server is connected to from a
>> different source.  At that point why wouldn't I just onboard everything through
>> the API?
>
> This is desired, but not done yet, AFAIK.
>
>>   - Doesn't grab hardware disk configurations, If the server has multiple raids
>> (r1 + r5) only reports boot raid disk capacity.
>
> This falls out from a limitation in Nova (discussed above) though I would
> encourage inspector to collect all the data (even if ironic/nova can't use it,
> today).
>
>>   - Inspection is geared towards using a different network and dnsmasq
>> infrastructure than what is in use for ironic/neutron.  Which also means that in
>> order to not conflict with dhcp requests for servers in ironic I need to use
>> different networks.  Which also means I now need to handle swinging server ports
>> between different networks.
>
> Inspector is designed to respond only to requests for nodes in the inspection
> phase, so that it *doesn't* conflict with provisioning of nodes by Ironic. I've
> been using the same network for inspection and provisioning without issue -- so
> I'm not sure what problem you're encountering here.
>
>>
>> 3.) IPA image:
>>   - Default build stuff is pinned to extremly old versions due to gate failure
>> issues. So I can not work without a fork for onboard of servers due to the fact
>> that IPMI modules aren't built for the kernel, so inspection can never match the
>> node against ironic.  Seems like currently functionality here is MVP for gate to
>> work and to deploy images.  But if you need to do firmware, bios-config, any
>> other hardware specific features you are pretty much going to need to roll your
>> own IPA image and IPA modules to do standard provisioning tasks.
>>
>
> That's correct. We assume that operators and downstream distributors will build
> and customize the IPA image as needed for their environment. Ironic only
> provides the base image and the tools to modify it; if we were to attempt to
> build an image that could handle every piece of hardware out there, it would be
> huge, unwieldy, and contain a lot of proprietary tools that we simply don't have
> access / license to use.
>
>> 4.) Conductor:
>>   - Serial-over-lan consoles require a unique port on the conductor server (I
>> have seen purposes to try and fix this?), this is painful to manage with large
>> numbers of servers.
>>   - SOL consoles aren't restarted when conductor is restarted (I think this
>> might be fixed in newer versions of ironic?), again if end users aren't suppose
>> to consume ironic api's directly - this is painful to handle.
>>   - As far as I can tell shell-in-a- box, SOL consoles aren't support via nova –
>> so how are end users suppose to consume the shell-in-box console?
>
> You are, unfortunately, correct. Ironic once supported SOL console connectivity
> through Nova, but it has not been working for a while now. We discussed this at
> length in the Austin summit and plan to fix it soon:
> https://review.openstack.org/#/c/319505/
>
>>   - Its very easy to get a node to fall off the staemachine rails (reboot a
>> server while an image is being deployed to it), the only way I have seen to be
>> able to fix this is to update the DB directly.
>
> Yea, that's a well known pain point, and there is ongoing work to improve the
> recovery process for nodes that get "stuck" in various ways, with the premise
> that the operator should never have to munge the DB directly. One approach we've
> discussed is adding a management CLI tool to make this cleaner.
>
>>   - I have BMC that need specific configuration (some require SOL on com2,
>> others on com1) this makes it pretty much impossible without per box overrides
>> against the conductor hardcoded templates.
>
> Ironic allows certain aspects of the Node's management to be overridden
> individually, but it sounds like you need some knobs that we haven't
> implemented. Could you file a bug for this? I think we'd be keen to add it.
>
>>   - Additionally it would be nice to default to having a provisioning
>> kernel/image that was set as a single config option with per server overrides –
>> rather than on each server.  If we ever change the IPA image – that means at
>> scale we would need to update thousands of ironic nodes.
>
> This request has surfaced in the past, however, it wouldn't make sense in a
> heterogeneous environment (eg, mix of ia64 and x86_64 hardware in one region)
> and so past discussions have landed on the side of not implementing it (either
> as a system-level default image or as a driver-level default image).
>
> If there were a consensus that it helped enough deployments, without increasing
> the complexity of complex multi-arch deployments, I think folks would be willing
> to accept a feature like this.
>
>>
>> What is ironic doing to monitor the hardware for failures?  I assume the answer
>> here is nothing and that we will need to make sure the images that we deploy are
>> correctly configuring the tools to monitor disk/health/psu's/ram errors, ect. ect.
>>
>
> Today, nothing, but this is something we want to do, and there is an agreed-upon
> design, here:
> http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/notifications.html
>
> This is only the initial design for the notification system. The goal of this
> would be to enable drivers to capture hardware alerts (or perform more proactive
> gathering of hardware status) and propagate those alerts up to the cloud operator.
>
>
>
> In summary, you're not alone, nor are your ideas/thoughts/requests unreasonable.
> We're all facing similar concerns -- and you're welcome to come hang out in
> #openstack-ironic and participate in shaping Ironic so that it meets your needs,
> too :)
>
> Regards,
> Devananda
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
    
    
More information about the OpenStack-dev
mailing list