[openstack-dev] [tripleo] Nodes management in our shiny new TripleO API

Dan Prince dprince at redhat.com
Sat May 21 18:35:14 UTC 2016


On Fri, 2016-05-20 at 14:06 +0200, Dmitry Tantsur wrote:
> On 05/20/2016 01:44 PM, Dan Prince wrote:
> > 
> > On Thu, 2016-05-19 at 15:31 +0200, Dmitry Tantsur wrote:
> > > 
> > > Hi all!
> > > 
> > > We started some discussions on https://review.openstack.org/#/c/3
> > > 0020
> > > 0/
> > > about the future of node management (registering, configuring and
> > > introspecting) in the new API, but I think it's more fair (and
> > > convenient) to move it here. The goal is to fix several long-
> > > standing
> > > design flaws that affect the logic behind tripleoclient. So
> > > fasten
> > > your
> > > seatbelts, here it goes.
> > > 
> > > If you already understand why we need to change this logic, just
> > > scroll
> > > down to "what do you propose?" section.
> > > 
> > > "introspection bulk start" is evil
> > > ----------------------------------
> > > 
> > > As many of you obviously know, TripleO used the following command
> > > for
> > > introspection:
> > > 
> > >   openstack baremetal introspection bulk start
> > > 
> > > As not everyone knows though, this command does not come from
> > > ironic-inspector project, it's part of TripleO itself. And the
> > > ironic
> > > team has some big problems with it.
> > > 
> > > The way it works is
> > > 
> > > 1. Take all nodes in "available" state and move them to
> > > "manageable"
> > > state
> > > 2. Execute introspection for all nodes in "manageable" state
> > > 3. Move all nodes with successful introspection to "available"
> > > state.
> > > 
> > > Step 3 is pretty controversial, step 1 is just horrible. This not
> > > how
> > > the ironic-inspector team designed introspection to work (hence
> > > it
> > > refuses to run on nodes in "available" state), and that's now how
> > > the
> > > ironic team expects the ironic state machine to be handled. To
> > > explain
> > > it I'll provide a brief information on the ironic state machine.
> > > 
> > > ironic node lifecycle
> > > ---------------------
> > > 
> > > With recent versions of the bare metal API (starting with 1.11),
> > > nodes
> > > begin their life in a state called "enroll". Nodes in this state
> > > are
> > > not
> > > available for deployment, nor for most of other actions. Ironic
> > > does
> > > not
> > > touch such nodes in any way.
> > > 
> > > To make nodes alive an operator uses "manage" provisioning action
> > > to
> > > move nodes to "manageable" state. During this transition the
> > > power
> > > and
> > > management credentials (IPMI, SSH, etc) are validated to ensure
> > > that
> > > nodes in "manageable" state are, well, manageable. This state is
> > > still
> > > not available for deployment. With nodes in this state an
> > > operator
> > > can
> > > execute various pre-deployment actions, such as introspection,
> > > RAID
> > > configuration, etc. So to sum it up, nodes in "manageable" state
> > > are
> > > being configured before exposing them into the cloud.
> > > 
> > > The last step before the deployment it to make nodes "available"
> > > using
> > > the "provide" provisioning action. Such nodes are exposed to
> > > nova,
> > > and
> > > can be deployed to at any moment. No long-running configuration
> > > actions
> > > should be run in this state. The "manage" action can be used to
> > > bring
> > > nodes back to "manageable" state for configuration (e.g.
> > > reintrospection).
> > > 
> > > so what's the problem?
> > > ----------------------
> > > 
> > > The problem is that TripleO essentially bypasses this logic by
> > > keeping
> > > all nodes "available" and walking them through provisioning steps
> > > automatically. Just a couple of examples of what gets broken:
> > > 
> > > (1) Imagine I have 10 nodes in my overcloud, 10 nodes ready for
> > > deployment (including potential autoscaling) and I want to enroll
> > > 10
> > > more nodes.
> > > 
> > > Both introspection and ready-state operations nowadays will touch
> > > both
> > > 10 new nodes AND 10 nodes which are ready for deployment,
> > > potentially
> > > making the latter not ready for deployment any more (and
> > > definitely
> > > moving them out of pool for some time).
> > > 
> > > Particularly, any manual configuration made by an operator before
> > > making
> > > nodes "available" may get destroyed.
> > > 
> > > (2) TripleO has to disable automated cleaning. Automated cleaning
> > > is
> > > a
> > > set of steps (currently only wiping the hard drive) that happens
> > > in
> > > ironic 1) before nodes are available, 2) after an instance is
> > > deleted.
> > > As TripleO CLI constantly moves nodes back-and-forth from and to
> > > "available" state, cleaning kicks in every time. Unless it's
> > > disabled.
> > > 
> > > Disabling cleaning might sound a sufficient work around, until
> > > you
> > > need
> > > it. And you actually do. Here is a real life example of how to
> > > get
> > > yourself broken by not having cleaning:
> > > 
> > > a. Deploy an overcloud instance
> > > b. Delete it
> > > c. Deploy an overcloud instance on a different hard drive
> > > d. Boom.
> > This sounds like an Ironic bug to me. Cleaning (wiping a disk) and
> > removing state that would break subsequent installations on a
> > different
> > drive are different things. In TripleO I think the reason we
> > disable
> > cleaning is largely because of the extra time it takes and the fact
> > that our baremetal cloud isn't multi-tenant (currently at least).
> We fix this "bug" by introducing cleaning. This is the process to 
> guarantee each deployment starts with a clean environment. It's hard
> to 
> known which remained data can cause which problem (e.g. what about a 
> remaining UEFI partition? any remainings of Ceph? I don't know).
> 
> > 
> > 
> > > 
> > > 
> > > As we didn't pass cleaning, there is still a config drive on the
> > > disk
> > > used in the first deployment. With 2 config drives present cloud-
> > > init
> > > will pick a random one, breaking the deployment.
> > TripleO isn't using config drive is it? Until Nova supports config
> > drives via Ironic I think we are blocked on using it.
> TripleO does use config drives (btw I'm telling you a real bug here,
> not 
> something I made up). Nova does support Ironic config drives, it
> does 
> not support (and does not want to) injecting random data from an
> Ironic 
> node there (we wanted to pass data from introspection to the node).
> 
> > 
> > 
> > > 
> > > 
> > > To top it all, TripleO users tend to not use root device hints,
> > > so
> > > switching root disks may happen randomly between deployments.
> > > Have
> > > fun
> > > debugging.
> > > 
> > > what do you propose?
> > > --------------------
> > > 
> > > I would like the new TripleO mistral workflows to start following
> > > the
> > > ironic state machine closer. Imagine the following workflows:
> > > 
> > > 1. register: take JSON, create nodes in "manageable" state. I do
> > > believe
> > > we can automate the enroll->manageable transition, as it serves
> > > the
> > > purpose of validation (and discovery, but lets put it aside).
> > > 
> > > 2. provide: take a list of nodes or all "managable" nodes and
> > > move
> > > them
> > > to "available". By using this workflow an operator will make a
> > > *conscious* decision to add some nodes to the cloud.
> > > 
> > > 3. introspect: take a list of "managable" (!!!) nodes or all
> > > "manageable" nodes and move them through introspection. This is
> > > an
> > > optional step between "register" and "provide".
> > > 
> > > 4. set_node_state: a helper workflow to move nodes between
> > > states.
> > > The
> > > "provide" workflow is essentially set_node_state with
> > > verb=provide,
> > > but
> > > is separate due to its high importance in the node lifecycle.
> > > 
> > > 5. configure: given a couple of parameters (deploy image, local
> > > boot
> > > flag, etc), update given or all "manageable" nodes with them.
> > I like how you've split things up into the above workflows.
> > Furthermore, I think we'll actually be able to accomplish most, if
> > not
> > all of it by using pure Mistral workflows (very little custom
> > actions
> > involved).
> > 
> > One refinement I might suggestion is that for the workflows that
> > take a
> > list of uuid's *or* search for a type of nodes that we might split
> > them
> > into two workflows. One which calls the other.
> Good idea.

Ryan Brady and I spent some time yesterday implementing the suggested
workflows (all except for the 'config' one I think which could come
later).

How does this one look:

https://review.openstack.org/#/c/300200/4/workbooks/baremetal.yaml

We've got some python-tripleoclient patches coming soon too which use
this updated workflow to do the node registration bits.

Dan

> 
> > 
> > 
> > For example a 'provide_managed_nodes' workflow would call into the
> > 'provide' workflow which takes a list of uuids? I think this gives
> > us
> > the same features we need and exposes the required input parameters
> > more cleanly to the end user.
> > 
> > So long as we can do the above and still make the existing python-
> > tripleoclient calls backwards compatible I think we should be in
> > good
> > shape.
> Awesome!
> 
> > 
> > 
> > Dan
> > 
> > 
> > > 
> > > 
> > > Essentially the only addition here is the "provide" action which
> > > I
> > > hope
> > > you already realize should be an explicit step.
> > > 
> > > what about tripleoclient
> > > ------------------------
> > > 
> > > Of course we want to keep backward compatibility. The existing
> > > commands
> > > 
> > >   openstack baremetal import
> > >   openstack baremetal configure boot
> > >   openstack baremetal introspection bulk start
> > > 
> > > will use some combinations of workflows above and will be
> > > deprecated.
> > > 
> > > The new commands (also avoiding hijacking into the bare metal
> > > namespaces) will be provided strictly matching the workflows
> > > (especially
> > > in terms of the state machine):
> > > 
> > >   openstack overcloud node import
> > >   openstack overcloud node configure
> > >   openstack overcloud node introspect
> > >   openstack overcloud node provide
> > > 
> > > (I have a good reasoning behind each of these names, but if I put
> > > it
> > > here this mail will be way too long).
> > > 
> > > Now to save a user some typing:
> > > 1. the configure command will be optional, as the import command
> > > will
> > > set the defaults
> > > 2. the introspect command will get --provide flag
> > > 3. the import command will get --introspect and --provide flags
> > > 
> > > So the simplest flow for people will be:
> > > 
> > >   openstack overcloud node import --provide instackenv.json
> > > 
> > > this command will use 2 workflows and will result in a bunch of
> > > "available" nodes, essentially making it a synonym of the
> > > "baremetal
> > > import" command.
> > > 
> > > With introspection it becomes:
> > > 
> > >   openstack overcloud node import --introspect --provide
> > > instackenv.json
> > > 
> > > this command will use 3 workflows and will result in "available"
> > > and
> > > introspected nodes.
> > > 
> > > 
> > > Thanks for reading such a long email (ping me on IRC if you
> > > actually
> > > read it through just for statistics). I hope it makes sense for
> > > you.
> > > 
> > > Dmitry.
> > > 
> > > _________________________________________________________________
> > > ____
> > > _____
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:un
> > > subs
> > > cribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > ___________________________________________________________________
> > _______
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsu
> > bscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> _____________________________________________________________________
> _____
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
> cribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list