[openstack-dev] [TripleO] Nodes Registration workflow improvements

Dmitry Tantsur dtantsur at redhat.com
Mon Jun 13 17:10:00 UTC 2016

On 06/13/2016 06:28 PM, Ben Nemec wrote:
> On 06/13/2016 09:41 AM, Jiri Tomasek wrote:
>> Hi all,
>> As we are close to merging the initial Nodes Registration workflows and
>> action [1, 2] using Mistral which successfully provides the current
>> registration logic via common API, I'd like to start discussion on how
>> to improve it so it satisfies GUI and CLI requirements. I'd like to try
>> to describe the clients goals and define requirements, describe current
>> workflow problems and propose a solution. I'd like to record the result
>> of discussion to Blueprint [3] which Ryan already created.
>> CLI goals and optimal workflow
>> ========================
>> CLI's main benefit is based on the fact that it's commands can simply
>> become part of a script, so it is important that the operation is
>> idempotent. The optimal CLI workflow is:
>> User runs 'openstack baremetal import' and provides instackenv.json file
>> which includes all nodes information. When the registration fails at
>> some point, user is notified about the error and re-runs the command
>> with the same set of nodes. Rinse and repeat until all nodes are
>> properly registered.
> I would actually not describe this as the optimal workflow for CLI
> registration either.  It would be much better if the registration
> completed for all the nodes that it can in the first place and then any
> failed nodes can be cleaned up later.  There's no reason one bad node in
> a file containing 100 nodes should block the entire deployment.
> On that note, the only part of your proposal that I'm not sure would be
> useful for the CLI is the non-blocking part.  I don't know that a CLI
> fire-and-forget mode makes a lot of sense, although if there's a way for
> the CLI to check the status then that might be fine too.  However,
> pretty much all of the other usability stuff you're talking about would
> benefit the CLI too.
>> GUI goals and optimal workflow
>> =========================
>> GUI's main goal is to provide a user friendly way to register nodes,
>> inform the user on the process, handle the problems and lets user fix
>> them. GUI strives for being responsive and interactive.
>> GUI allows user to add nodes specification manually one by one by
>> provided form or allow user (in same manner as CLI) to provide the
>> instackenv.json file which holds the nodes description. Importing the
>> file (or adding node manually) will populate an array of nodes the user
>> wants to register. User is able to browse these nodes and make
>> corrections to their configuration. GUI provides client side validations
>> to verify inputs (node name format, required fields, mac address, ip
>> address format etc.)
>> Then user triggers the registration. The nodes are moved to nodes table
>> as they are being registered. If an error occurs during registration of
>> any of the nodes, user is notified about the issue and can fix it in
>> registration form and can re-trigger registration for failed nodes.
>> Rinse and repeat until all nodes are successfully registered and in
>> proper state (manageable).
>> Such workflow keeps the GUI interactive, user does not have to look at
>> the spinner for several minutes (in case of registering hundreds of
>> nodes), hoping that something does not happen wrong. User is constantly
>> informed about the progress, user is able to react to the situation as
>> he wants, User is able to freely interact with the GUI while
>> registration is happening on the background. User is able to register
>> nodes in batches.
>> Current solution
>> =============
>> Current solution uses register_or_update_nodes workflow [1] which takes
>> a nodes_json array and runs register_or_update_nodes and
>> set_nodes_managed tasks. When the whole operation completes it sends
>> Zaqar message notifying about the result of the registration of the
>> whole batch of nodes.
>> register_or_update_nodes runs tripleo.register_or_update_nodes action
>> [2] which uses business logic in tripleo_common/utils/nodes.py
>> utils.nodes.py module has been originally extracted from tripleoclient
>> to get the business logic behind the common API. It does following:
>> - converts the instackenv.json nodes format to appropriate ironic driver
>> format (driver-info fields)
>> - sets kernel and ramdisk ids defaults if they're not provided
>> - for each node it tests if node already exists (finds nodes by mac
>> addresses) and updates it or registers it as new based on the result.
>> Current Problems:
>> - no zaqar notification is sent for each node
>> - nodes are registered in batch, registration fails when error happens
>> on a certain node, leaving already registered nodes in inconsistent state
>> - workflow does not notify user about what nodes have been registered
>> and what failed, only thing user gets is relevant error message
>> - when the workflow succeeds, the registered_nodes list sent by Zaqar
>> message has outdated information
>> - when nodes are updated using nodes registration, the forkflow ends up
>> as failed, without any error output, although the nodes are updated
>> successfully
>> - utils/nodes.py decides whether the node should be created or updated
>> based on mac address which is subject to change. It needs to be done by
>> UUID which is fixed.
> The UUID is not fixed.  If you register the same node twice, it will get
> a different UUID each time (unless you explicitly set the UUID as I see
> you can now do, but we can't and shouldn't require that).  The MAC
> shouldn't change unless you spoof it (which is not something I think we
> should be designing around) or point it at a different NIC.  In the
> latter case, we have no way of knowing whether you changed to a
> different NIC on the same system or reassigned an IPMI address to a
> completely different system, so requiring that to be a new node is the
> only sane way to handle it AFAICT.

Starting last week you can put UUID to instackenv.json and ironic will 
use it.

> I'm not actually sure from just reading the code that we would require a
> new node on a mac change anyway.  In that case I think it would match
> the IPMI address of the node instead and select that existing node for
> update.  I could be wrong about that since I haven't actually tested it,
> but that's how it looks to me.

We already use IPMI address to match, but
1. it can be changed
2. lookup will fail if we switch driver

>> - utils/nodes.py uses instackenv.json nodes list format - the conversion
>> should be done in client
> Why?  Putting logic like that in the client is evil.
>> - instackenv.json uses nodes list format which is not compatible with
>> ironic which forces us to do the format conversions and limit the ironic
>> driver support
> Totally agree that we should just drop the translation layer.  It isn't
> a useful abstraction - if you're using a vendor-specific driver, then
> you probably have to set driver-specific keys, so you can't just swap
> out drivers in your config anyway.  If you're not using driver-specific
> keys then you still aren't gaining anything from this extra layer.
> From looking at the existing code, the only _possible_ benefit from this
> layer is some abstraction of drivers that use a different key for the
> actual IPMI address, but honestly that inconsistency sounds like
> something that should be fixed in Ironic, not papered over in the node
> registration code.  I guess that's a discussion we'd have to have with
> the Ironic team.

+1. we were talking about it, but it wasn't a big priority. Some push 
from TripleO side would probably help :)

>> Proposed changes
>> ===============
>> To satisfy clients requirements we need to:
>> - assure the idempotency of idempotency of running the nodes
>> registration providing the instackenv.json
>> - enable the workflow to track each node registration workflow separately
>> The changes can be done in 2 steps:
>> 1. refactor register_or_update_nodes workflow and utils/nodes.py
>> - register_or_update_nodes workflow calls register_or_update_node
>> workflow for each node, that workflow then runs tasks:
>> tripleo.register_or_update_node (action), set_node_managed,
>> send_message. When whole workflow finishes, summary message is sent.
>> - reduce the register action and utils/nodes.py to the mechanism of
>> deciding on whether we create new or update the node - based on whether
>> the uuid is provided in nodes list and whether node is present in ironic
>> - move the format conversion from utils/nodes.py to client
>> These changes allow each node to finish it's registration without being
>> interrupted by failure on other node, avoid nodes getting into
>> inconsistent state, allow interactive reporting on each nodes state,
>> allows to run validations on each node separately
>> 2. change instackenv.json nodes format to match Ironic
>> - this change allows the client to pass data from instackenv.json to
>> ironic without intervention, so since the interaction is data > ironic,
>> user can specify any ironic driver in instackenv.json. In addition, GUI
>> can dynamically display relevant properties user needs to fill in based
>> on selected driver.
>> Possible problems:
>> We need to consider performance impact by running separate workflow for
>> each node, doing some benchmark tests would be beneficial.
>> Alternative solution:
>> If we decide to stick with single workflow it will have the UX impact as
>> we won't be able to interactively report on each node changes. But we'll
>> still be able to report on each task result. Other requirements still stand:
>> - nodes which got registered before the failure happened need to
>> continue with next tasks (set_node_managed)
>> - workflow needs to report registered nodes list, failed nodes list and
>> error.
>> [1] https://review.openstack.org/#/c/300200/
>> [2] https://review.openstack.org/#/c/319587/
>> [3]
>> https://blueprints.launchpad.net/tripleo-common/+spec/improve-baremetal-workflows
>> -- Jirka
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list