[openstack-dev] [TripleO] Nodes Registration workflow improvements
Jiri Tomasek
jtomasek at redhat.com
Mon Jun 13 14:41:10 UTC 2016
Hi all,
As we are close to merging the initial Nodes Registration workflows and
action [1, 2] using Mistral which successfully provides the current
registration logic via common API, I'd like to start discussion on how
to improve it so it satisfies GUI and CLI requirements. I'd like to try
to describe the clients goals and define requirements, describe current
workflow problems and propose a solution. I'd like to record the result
of discussion to Blueprint [3] which Ryan already created.
CLI goals and optimal workflow
========================
CLI's main benefit is based on the fact that it's commands can simply
become part of a script, so it is important that the operation is
idempotent. The optimal CLI workflow is:
User runs 'openstack baremetal import' and provides instackenv.json file
which includes all nodes information. When the registration fails at
some point, user is notified about the error and re-runs the command
with the same set of nodes. Rinse and repeat until all nodes are
properly registered.
GUI goals and optimal workflow
=========================
GUI's main goal is to provide a user friendly way to register nodes,
inform the user on the process, handle the problems and lets user fix
them. GUI strives for being responsive and interactive.
GUI allows user to add nodes specification manually one by one by
provided form or allow user (in same manner as CLI) to provide the
instackenv.json file which holds the nodes description. Importing the
file (or adding node manually) will populate an array of nodes the user
wants to register. User is able to browse these nodes and make
corrections to their configuration. GUI provides client side validations
to verify inputs (node name format, required fields, mac address, ip
address format etc.)
Then user triggers the registration. The nodes are moved to nodes table
as they are being registered. If an error occurs during registration of
any of the nodes, user is notified about the issue and can fix it in
registration form and can re-trigger registration for failed nodes.
Rinse and repeat until all nodes are successfully registered and in
proper state (manageable).
Such workflow keeps the GUI interactive, user does not have to look at
the spinner for several minutes (in case of registering hundreds of
nodes), hoping that something does not happen wrong. User is constantly
informed about the progress, user is able to react to the situation as
he wants, User is able to freely interact with the GUI while
registration is happening on the background. User is able to register
nodes in batches.
Current solution
=============
Current solution uses register_or_update_nodes workflow [1] which takes
a nodes_json array and runs register_or_update_nodes and
set_nodes_managed tasks. When the whole operation completes it sends
Zaqar message notifying about the result of the registration of the
whole batch of nodes.
register_or_update_nodes runs tripleo.register_or_update_nodes action
[2] which uses business logic in tripleo_common/utils/nodes.py
utils.nodes.py module has been originally extracted from tripleoclient
to get the business logic behind the common API. It does following:
- converts the instackenv.json nodes format to appropriate ironic driver
format (driver-info fields)
- sets kernel and ramdisk ids defaults if they're not provided
- for each node it tests if node already exists (finds nodes by mac
addresses) and updates it or registers it as new based on the result.
Current Problems:
- no zaqar notification is sent for each node
- nodes are registered in batch, registration fails when error happens
on a certain node, leaving already registered nodes in inconsistent state
- workflow does not notify user about what nodes have been registered
and what failed, only thing user gets is relevant error message
- when the workflow succeeds, the registered_nodes list sent by Zaqar
message has outdated information
- when nodes are updated using nodes registration, the forkflow ends up
as failed, without any error output, although the nodes are updated
successfully
- utils/nodes.py decides whether the node should be created or updated
based on mac address which is subject to change. It needs to be done by
UUID which is fixed.
- utils/nodes.py uses instackenv.json nodes list format - the conversion
should be done in client
- instackenv.json uses nodes list format which is not compatible with
ironic which forces us to do the format conversions and limit the ironic
driver support
Proposed changes
===============
To satisfy clients requirements we need to:
- assure the idempotency of idempotency of running the nodes
registration providing the instackenv.json
- enable the workflow to track each node registration workflow separately
The changes can be done in 2 steps:
1. refactor register_or_update_nodes workflow and utils/nodes.py
- register_or_update_nodes workflow calls register_or_update_node
workflow for each node, that workflow then runs tasks:
tripleo.register_or_update_node (action), set_node_managed,
send_message. When whole workflow finishes, summary message is sent.
- reduce the register action and utils/nodes.py to the mechanism of
deciding on whether we create new or update the node - based on whether
the uuid is provided in nodes list and whether node is present in ironic
- move the format conversion from utils/nodes.py to client
These changes allow each node to finish it's registration without being
interrupted by failure on other node, avoid nodes getting into
inconsistent state, allow interactive reporting on each nodes state,
allows to run validations on each node separately
2. change instackenv.json nodes format to match Ironic
- this change allows the client to pass data from instackenv.json to
ironic without intervention, so since the interaction is data > ironic,
user can specify any ironic driver in instackenv.json. In addition, GUI
can dynamically display relevant properties user needs to fill in based
on selected driver.
Possible problems:
We need to consider performance impact by running separate workflow for
each node, doing some benchmark tests would be beneficial.
Alternative solution:
If we decide to stick with single workflow it will have the UX impact as
we won't be able to interactively report on each node changes. But we'll
still be able to report on each task result. Other requirements still stand:
- nodes which got registered before the failure happened need to
continue with next tasks (set_node_managed)
- workflow needs to report registered nodes list, failed nodes list and
error.
[1] https://review.openstack.org/#/c/300200/
[2] https://review.openstack.org/#/c/319587/
[3]
https://blueprints.launchpad.net/tripleo-common/+spec/improve-baremetal-workflows
-- Jirka
More information about the OpenStack-dev
mailing list