[openstack-dev] [TripleO][Ironic][Nova-BM] avoiding self-power-off scenarios
Devananda van der Veen
devananda.vdv at gmail.com
Tue Apr 8 04:52:22 UTC 2014
In case it isn't clear to others, or in case I've misunderstood, I'd like
to start by rephrasing the problem statement.
* It is possible to use Ironic to deploy an instance of ironic-conductor on
bare metal, which joins the same cluster that deployed it.
* This, or some other event, could cause the hash ring distribution to
change such that the instance of ironic-conductor is managed by itself.
* A request to do any management (eg, power off) that instance will fail in
interesting ways...
Adding a CONF setting that a conductor may optionally advertise, which
alters the hash mapping and prevents self-managing is reasonable. The
ironic.common.hash_ring will need to avoid mapping a node onto a conductor
with the same advertised UUID, but I think that will be easy. We can't
assume the driver has a "pm_address" key, though - some drivers may not.
Since the hash ring already knows node UUID, and a node's UUID is known
before an instance can be deployed to it, I think this will work. You can
pass that node's UUID in via heat when deploying Ironic via Ironic, and the
config will be present the first time the service starts, regardless of
which power driver is used.
Also, the node UUID is already pushed out to Nova instance metadata :)
--
Devananda
On Apr 5, 2014 2:01 PM, "Robert Collins" <robertc at robertcollins.net> wrote:
> One fairly common failure mode folk run into is registering a node
> with a nova-bm/ironic environment that is itself part of that
> environment. E.g. if you deploy ironic-conductor using Ironic (scaling
> out a cluster say), that conductor can then potentially power itself
> off if the node that represents itself happens to map to it in the
> hash ring. It happens manually too when folk just are entering lots of
> nodes and don't realise one of them is also a deployment server :).
>
> I'm thinking that a good solution will have the following properties:
> - its possible to manually guard against this
> - we can easily make the guard work for nova deployed machines
>
> And that we don't need to worry about:
> - arbitrary other machines in the cluster (because thats a heat
> responsibility, to not request redeploy of too many machines at once).
>
> For now, I only want to think about solving this for Ironic :).
>
> I think the following design might work:
> - a config knob in ironic-conductor that specifies its own pm_address
> - we push that back up as part of the hash ring metadata
> - in the hash ring don't set a primary or fallback conductor if the
> node pm address matches the conductor self pm address
> - in the Nova Ironic driver add instance metadata with the pm address
> (only) of the node
>
> Then we can just glue the instance metadata field to the conductor config
> key.
>
> -Rob
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140407/389f8227/attachment.html>
More information about the OpenStack-dev
mailing list