[openstack-dev] [TripleO] Nodes Registration workflow improvements

Dmitry Tantsur dtantsur at redhat.com
Mon Jun 13 15:44:30 UTC 2016


On 06/13/2016 04:41 PM, Jiri Tomasek wrote:
> Hi all,
>
> As we are close to merging the initial Nodes Registration workflows and
> action [1, 2] using Mistral which successfully provides the current
> registration logic via common API, I'd like to start discussion on how
> to improve it so it satisfies GUI and CLI requirements. I'd like to try
> to describe the clients goals and define requirements, describe current
> workflow problems and propose a solution. I'd like to record the result
> of discussion to Blueprint [3] which Ryan already created.

Hi and thanks for writing this up. Just a few clarifying comments inline.

>
>
> CLI goals and optimal workflow
> ========================
>
> CLI's main benefit is based on the fact that it's commands can simply
> become part of a script, so it is important that the operation is
> idempotent. The optimal CLI workflow is:
>
> User runs 'openstack baremetal import' and provides instackenv.json file
> which includes all nodes information. When the registration fails at
> some point, user is notified about the error and re-runs the command
> with the same set of nodes. Rinse and repeat until all nodes are
> properly registered.

Note that while in your example everything works, the command is not 
idempotent, and e.g. running it in the middle of deployment will 
probably cause funny things to happen.

>
>
> GUI goals and optimal workflow
> =========================
>
> GUI's main goal is to provide a user friendly way to register nodes,
> inform the user on the process, handle the problems and lets user fix
> them. GUI strives for being responsive and interactive.
>
> GUI allows user to add nodes specification manually one by one by
> provided form or allow user (in same manner as CLI) to provide the
> instackenv.json file which holds the nodes description. Importing the
> file (or adding node manually) will populate an array of nodes the user
> wants to register. User is able to browse these nodes and make
> corrections to their configuration. GUI provides client side validations
> to verify inputs (node name format, required fields, mac address, ip
> address format etc.)

It's worth nothing that Ironic has API to provide required and optional 
properties for all drivers. But of course not in instackenv format ;)

>
> Then user triggers the registration. The nodes are moved to nodes table
> as they are being registered. If an error occurs during registration of
> any of the nodes, user is notified about the issue and can fix it in
> registration form and can re-trigger registration for failed nodes.
> Rinse and repeat until all nodes are successfully registered and in
> proper state (manageable).
>
> Such workflow keeps the GUI interactive, user does not have to look at
> the spinner for several minutes (in case of registering hundreds of
> nodes), hoping that something does not happen wrong. User is constantly
> informed about the progress, user is able to react to the situation as
> he wants, User is able to freely interact with the GUI while
> registration is happening on the background. User is able to register
> nodes in batches.
>
>
> Current solution
> =============
>
> Current solution uses register_or_update_nodes workflow [1] which takes
> a nodes_json array and runs register_or_update_nodes and
> set_nodes_managed tasks. When the whole operation completes it sends
> Zaqar message notifying about the result of the registration of the
> whole batch of nodes.
>
> register_or_update_nodes runs tripleo.register_or_update_nodes action
> [2] which uses business logic in tripleo_common/utils/nodes.py
>
> utils.nodes.py module has been originally extracted from tripleoclient
> to get the business logic behind the common API. It does following:
>
> - converts the instackenv.json nodes format to appropriate ironic driver
> format (driver-info fields)
> - sets kernel and ramdisk ids defaults if they're not provided
> - for each node it tests if node already exists (finds nodes by mac
> addresses) and updates it or registers it as new based on the result.
>
>
> Current Problems:
> - no zaqar notification is sent for each node
> - nodes are registered in batch, registration fails when error happens
> on a certain node, leaving already registered nodes in inconsistent state
> - workflow does not notify user about what nodes have been registered
> and what failed, only thing user gets is relevant error message
> - when the workflow succeeds, the registered_nodes list sent by Zaqar
> message has outdated information
> - when nodes are updated using nodes registration, the forkflow ends up
> as failed, without any error output, although the nodes are updated
> successfully
>
> - utils/nodes.py decides whether the node should be created or updated
> based on mac address which is subject to change. It needs to be done by
> UUID which is fixed.

Well, only if a user/CLI/GUI generates this UUID.

Also it's using not only MACs. We don't require MACs for non-virtual 
nodes, so it also uses some smart matching with BMC credentials in the 
same fashion as Ironic Inspector.

> - utils/nodes.py uses instackenv.json nodes list format - the conversion
> should be done in client
>
> - instackenv.json uses nodes list format which is not compatible with
> ironic which forces us to do the format conversions and limit the ironic
> driver support

Note that the latter is a huge issue actually. Ironic currently has 30 
in-tree drivers (not counting fake ones) and several out-of-tree ones. 
Currently TripleO has to hardcode all of them, as well as their required 
properties. This does not scale.

>
>
> Proposed changes
> ===============
>
> To satisfy clients requirements we need to:
> - assure the idempotency of idempotency of running the nodes
> registration providing the instackenv.json

Oh. Well, see above about idempotency.

> - enable the workflow to track each node registration workflow separately
>
>
> The changes can be done in 2 steps:
> 1. refactor register_or_update_nodes workflow and utils/nodes.py
>
> - register_or_update_nodes workflow calls register_or_update_node
> workflow for each node, that workflow then runs tasks:
> tripleo.register_or_update_node (action), set_node_managed,
> send_message. When whole workflow finishes, summary message is sent.
> - reduce the register action and utils/nodes.py to the mechanism of
> deciding on whether we create new or update the node - based on whether
> the uuid is provided in nodes list and whether node is present in ironic
> - move the format conversion from utils/nodes.py to client
>
> These changes allow each node to finish it's registration without being
> interrupted by failure on other node, avoid nodes getting into
> inconsistent state, allow interactive reporting on each nodes state,
> allows to run validations on each node separately
>
> 2. change instackenv.json nodes format to match Ironic
> - this change allows the client to pass data from instackenv.json to
> ironic without intervention, so since the interaction is data > ironic,
> user can specify any ironic driver in instackenv.json. In addition, GUI
> can dynamically display relevant properties user needs to fill in based
> on selected driver.

Note that this can be done in a non-breaking fashion by making TripleO 
accept Ironic-style fields as well as instackenv-style ones. I've 
already started this work and I plan to continue it anyway (help is 
always welcome).

>
>
> Possible problems:
> We need to consider performance impact by running separate workflow for
> each node, doing some benchmark tests would be beneficial.
>
> Alternative solution:
> If we decide to stick with single workflow it will have the UX impact as
> we won't be able to interactively report on each node changes. But we'll
> still be able to report on each task result. Other requirements still
> stand:
> - nodes which got registered before the failure happened need to
> continue with next tasks (set_node_managed)
> - workflow needs to report registered nodes list, failed nodes list and
> error.
>
>
>
> [1] https://review.openstack.org/#/c/300200/
> [2] https://review.openstack.org/#/c/319587/
> [3]
> https://blueprints.launchpad.net/tripleo-common/+spec/improve-baremetal-workflows
>
>
>
> -- Jirka
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list