[openstack-dev] [TripleO] Should we have a TripleO API, or simply use Mistral?

Dougal Matthews dougal at redhat.com
Wed Jan 20 16:09:01 UTC 2016

On 20 January 2016 at 10:03, Jiří Stránský <jistr at redhat.com> wrote:

> On 18.1.2016 19:49, Tzu-Mainn Chen wrote:
>> ----- Original Message -----
>>> On Thu, 2016-01-14 at 16:04 -0500, Tzu-Mainn Chen wrote:
>>>> ----- Original Message -----
>>>>> On Wed, Jan 13, 2016 at 04:41:28AM -0500, Tzu-Mainn Chen wrote:
>>>>>> Hey all,
>>>>>> I realize now from the title of the other TripleO/Mistral thread
>>>>>> [1] that
>>>>>> the discussion there may have gotten confused.  I think using
>>>>>> Mistral for
>>>>>> TripleO processes that are obviously workflows - stack
>>>>>> deployment, node
>>>>>> registration - makes perfect sense.  That thread is exploring
>>>>>> practicalities
>>>>>> for doing that, and I think that's great work.
>>>>>> What I inappropriately started to address in that thread was a
>>>>>> somewhat
>>>>>> orthogonal point that Dan asked in his original email, namely:
>>>>>> "what it might look like if we were to use Mistral as a
>>>>>> replacement for the
>>>>>> TripleO API entirely"
>>>>>> I'd like to create this thread to talk about that; more of a
>>>>>> 'should we'
>>>>>> than 'can we'.  And to do that, I want to indulge in a thought
>>>>>> exercise
>>>>>> stemming from an IRC discussion with Dan and others.  All, please
>>>>>> correct
>>>>>> me
>>>>>> if I've misstated anything.
>>>>>> The IRC discussion revolved around one use case: deploying a Heat
>>>>>> stack
>>>>>> directly from a Swift container.  With an updated patch, the Heat
>>>>>> CLI can
>>>>>> support this functionality natively.  Then we don't need a
>>>>>> TripleO API; we
>>>>>> can use Mistral to access that functionality, and we're done,
>>>>>> with no need
>>>>>> for additional code within TripleO.  And, as I understand it,
>>>>>> that's the
>>>>>> true motivation for using Mistral instead of a TripleO API:
>>>>>> avoiding custom
>>>>>> code within TripleO.
>>>>>> That's definitely a worthy goal... except from my perspective,
>>>>>> the story
>>>>>> doesn't quite end there.  A GUI needs additional functionality,
>>>>>> which boils
>>>>>> down to: understanding the Heat deployment templates in order to
>>>>>> provide
>>>>>> options for a user; and persisting those options within a Heat
>>>>>> environment
>>>>>> file.
>>>>>> Right away I think we hit a problem.  Where does the code for
>>>>>> 'understanding
>>>>>> options' go?  Much of that understanding comes from the
>>>>>> capabilities map
>>>>>> in tripleo-heat-templates [2]; it would make sense to me that
>>>>>> responsibility
>>>>>> for that would fall to a TripleO library.
>>>>>> Still, perhaps we can limit the amount of TripleO code.  So to
>>>>>> give API
>>>>>> access to 'getDeploymentOptions', we can create a Mistral
>>>>>> workflow.
>>>>>>    Retrieve Heat templates from Swift -> Parse capabilities map
>>>>>> Which is fine-ish, except from an architectural perspective
>>>>>> 'getDeploymentOptions' violates the abstraction layer between
>>>>>> storage and
>>>>>> business logic, a problem that is compounded because
>>>>>> 'getDeploymentOptions'
>>>>>> is not the only functionality that accesses the Heat templates
>>>>>> and needs
>>>>>> exposure through an API.  And, as has been discussed on a
>>>>>> separate TripleO
>>>>>> thread, we're not even sure Swift is sufficient for our needs;
>>>>>> one possible
>>>>>> consideration right now is allowing deployment from templates
>>>>>> stored in
>>>>>> multiple places, such as the file system or git.
>>>>> Actually, that whole capabilities map thing is a workaround for a
>>>>> missing
>>>>> feature in Heat, which I have proposed, but am having a hard time
>>>>> reaching
>>>>> consensus on within the Heat community:
>>>>> https://review.openstack.org/#/c/196656/
>>>>> Given that is a large part of what's anticipated to be provided by
>>>>> the
>>>>> proposed TripleO API, I'd welcome feedback and collaboration so we
>>>>> can move
>>>>> that forward, vs solving only for TripleO.
>>>>> Are we going to have duplicate 'getDeploymentOptions' workflows
>>>>>> for each
>>>>>> storage mechanism?  If we consolidate the storage code within a
>>>>>> TripleO
>>>>>> library, do we really need a *workflow* to call a single
>>>>>> function?  Is a
>>>>>> thin TripleO API that contains no additional business logic
>>>>>> really so bad
>>>>>> at that point?
>>>>> Actually, this is an argument for making the validation part of the
>>>>> deployment a workflow - then the interface with the storage
>>>>> mechanism
>>>>> becomes more easily pluggable vs baked into an opaque-to-operators
>>>>> API.
>>>>> E.g, in the long term, imagine the capabilities feature exists in
>>>>> Heat, you
>>>>> then have a pre-deployment workflow that looks something like:
>>>>> 1. Retrieve golden templates from a template store
>>>>> 2. Pass templates to Heat, get capabilities map which defines
>>>>> features user
>>>>> must/may select.
>>>>> 3. Prompt user for input to select required capabilites
>>>>> 4. Pass user input to Heat, validate the configuration, get a
>>>>> mapping of
>>>>> required options for the selected capabilities (nested validation)
>>>>> 5. Push the validated pieces ("plan" in TripleO API terminology) to
>>>>> a
>>>>> template store
>>>>> This is a pre-deployment validation workflow, and it's a superset
>>>>> of the
>>>>> getDeploymentOptions feature you refer to.
>>>>> Historically, TripleO has had a major gap wrt workflow, meaning
>>>>> that we've
>>>>> always implemented it either via shell scripts (tripleo-incubator)
>>>>> or
>>>>> python code (tripleo-common/tripleo-client, potentially TripleO
>>>>> API).
>>>>> So I think what Dan is exploring is, how do we avoid reimplementing
>>>>> a
>>>>> workflow engine, when a project exists which already does that.
>>>>> My gut reaction is to say that proposing Mistral in place of a
>>>>>> TripleO API
>>>>>> is to look at the engineering concerns from the wrong
>>>>>> direction.  The
>>>>>> Mistral alternative comes from a desire to limit custom TripleO
>>>>>> code at all
>>>>>> costs.  I think that is an extremely dangerous attitude that
>>>>>> leads to
>>>>>> compromises and workarounds that will quickly lead to a shaky
>>>>>> code base
>>>>>> full of design flaws that make it difficult to implement or
>>>>>> extend any
>>>>>> functionality cleanly.
>>>>> I think it's not about limiting TripleO code at all costs, it's
>>>>> about
>>>>> learning from past mistakes, where long-term TripleO specific
>>>>> workarounds
>>>>> for gaps in other projects have become serious technical debt.
>>>>> For example, the old merge.py approach to template composition was
>>>>> a
>>>>> workaround for missing heat features, then Tuskar was another
>>>>> workaround
>>>>> (arguably) for missing heat features, and now we're again proposing
>>>>> a
>>>>> long-term workaround for some missing heat features, some of which
>>>>> are
>>>>> already proposed (referring to the API for capabilities
>>>>> resolution).
>>>> This is an important point, thanks for bringing it up!
>>>> I think that I might have a different understanding of the lessons to
>>>> be
>>>> learned from Tuskar's limitations.  There were actually two issues
>>>> that
>>>> arose.  The first was that Tuskar was far too specific in how it
>>>> tried to
>>>> manipulated Heat pieces.  The second - and more serious, from my
>>>> point of
>>>> view - was that there literally was no way for an API-based GUI to
>>>> perform the tasks it needed to in order to do the correct
>>>> manipulation
>>>> (environment selection), because there was no Heat API in place for
>>>> doing
>>>> so.
>>>> My takeaway from the first issue was that any potential TripleO API
>>>> in
>>>> the future needed to be very low-level, a light skimming on top of
>>>> the
>>>> OpenStack services it uses.  The plan creation process that the
>>>> tripleo-common library spec describes is that: it's just a couple of
>>>> methods designed to allow a user to create an environment file, which
>>>> can then be used for deploying the overcloud.
>>>> My takeaway from the second issue was a bit more complicated.  A
>>>> required feature was missing, and although the proper functionality
>>>> needed to enable it in Heat was identified, it was unclear (and
>>>> remains
>>>> unclear) whether that feature truly belonged in Heat.  What does a
>>>> GUI
>>>> do then?  The GUI could take a cycle off, which is essentially what
>>>> happened here; I don't think that's a reasonable solution.  We could
>>>> hope that we arrive at a 100% foolproof and immutable deployment
>>>> solution
>>>> in the future, arriving at a point where no new features would ever
>>>> be
>>>> needed; I don't think that's a practical hope.
>>>> The third solution that came to mind was the idea of creating the
>>>> TripleO API.  It gives us a place to add in missing features if
>>>> needed.
>>>> And I think it also gives us a useful layer of indirection.  The
>>>> consumers of TripleO want a stable API, so that a new release doesn't
>>>> force them to do a massive update of their code; the TripleO API
>>>> would
>>>> provide that, allowing us to switch code behind the scenes (say, if
>>>> the capabilities feature lands in Heat).
>>> I think the above example would work equally well in a generic workflow
>>> sort of tool. You could image that the inputs to the workflow remain
>>> the same... but rather than running our own code in some interim step
>>> we simply call Heat directly for the capabilities map feature.
>>> So regardless of whether we build our own API or use a generic workflow
>>> too I think we still have what I would call a "release valve" to let us
>>> inject some custom code (actions) into the workflow. Like we discussed
>>> last week on IRC I would like to minimize the number of custom actions
>>> we have (with an eye towards things living in the upstream OpenStack
>>> projects) but it is fine to do this either way and would work equally
>>> well w/ Mistral and TripleO API.
>>>> I think I kinda view TripleO as a 'best practices' project.  Using
>>>> OpenStack is a confusing experience, with a million different options
>>>> and choices to make.  TripleO provides users with an excellent guide.
>>>> But the problem is that best practices change, and I think that
>>>> perceived instability is dangerous for adoption of TripleO.
>>>> So having a TripleO library and its associated API be a 'best
>>>> practices'
>>>> library makes sense to me.  It gives consumers a stable platform upon
>>>> which to use TripleO, while allowing us to be flexible behind the
>>>> scenes.
>>>> The 'best practice' for Heat capabilities right now is a workaround,
>>>> because it hasn't been judged to be suitable to go into Heat itself.
>>>> If that changes, we get to shift as well - and all of these changes
>>>> are
>>>> invisible to the API consumer.
>>> I mentioned this in my "Driving workflows with Mistral" thread but with
>>> regards to stability I view say Heat's v1 API or Mistral's v2 API as
>>> both being way more stable that what we could ever achieve with TripleO
>>> API. The real trick to API stability with something like Heat or
>>> Mistral is how we manage the inputs and outputs to Stacks and Workflows
>>> themselves. So long as we are mindful of this I can't image an end user
>>> (say a GUI writer or whoever) would really care whether they POST to
>>> Mistral or something we've created. The nice thing about using other
>>> OpenStack projects like Heat or Mistral is that they very likely have
>>> better community and documentation around these things as well that we
>>> would ever have.
>>> The more I look at using Mistral for some of the cases that have been
>>> brought up the more it seems to make sense for a lot of the workflows
>>> we need. I don't believe we can achieve better stability by creating
>>> what sounds more and more like a shim/proxy API rather than using the
>>> versioned API's that OpenStack already provides.
>>> There may be some corner cases where a "GUI helper" API comes into play
>>> for some sort of caching or something. I'm not blocking anyone from
>>> creating these sorts of features if they need them. And again if it is
>>> something that could be added to an upstream OpenStack project like
>>> Heat or Mistral I would look there first. So perhaps Zaqar for
>>> websockets instead of rolling our own, this sort of thing.
>>> What does concern me is that we are overstating what TripleO API should
>>> actually contain should we choose to pursue it. Initially it was
>>> positioned as the "TripleO workflow API". I think we now agree that we
>>> probably shouldn't put all of our workflows behind it. So if our stance
>>> has changed would it make sense to compile a new list of what we
>>> believe belongs behind our own TripleO API vs. what we consider
>>> workflows.
>> I wonder if it would be helpful to get operator feedback here - show them
>>   the advantages/disadvantages of both options and to get a sense of what
>> might be useful/necessary for them to use TripleO effectively?
> (I'm going off on a tangent a bit, but please bear with me, i'm using all
> that to support the point in the end. The implications of building a
> TripleO API touch on various topics.)
> Yes i think we should gather operator feedback. We already got some, but
> we should gather more whenever possible.
> One kind of (negative) feedback i've heard is that overcloud management is
> too much of a "blackbox" compared to what operators are used to. The
> feedback i recall was that it's hard to tell what is going to happen when
> running an overcloud stack update, and that we cannot re-execute the
> software config management independently.
> Building another umbrella API to rule the already largely umbrella-like
> deployment process (think what all responsibilities lie within the
> tripleo-heat-templates codebase, and within the single 'overcloud' Heat
> stack) would probably make matters more blackboxy and go further in the
> direction of "i feel like i don't know what's happening to my cloud when i
> use the management tool".

I completely agree that we want to make the tool less of a blackbox. I am
not convinced that Mistral will do this (do tripleo-heat-templates make
things less blackbox-y because they are YAML users can look at? Maybe for
some users but they still confuse me!). However, given that I think we all
agree Mistral is a good fit for some of the workflow tasks (introspection,
deploying, etc.) I think it is a good idea to see if Mistral will work
well, or well enough for the other tasks we need (essentially some template
introspection/processing). It will certainly be more obvious what is going
on if all the actions are in Mistral and now split between it and a custom

What i think could improve the situation for operators is trying to chunk
> up what we already have into smaller, more independently operable parts.
> The split-stack approach already discussed on the TripleO meeting and on
> #tripleo could help with this. Essentially separating our hardware
> management from our software config management. Being able to re-apply
> software configuration without being afraid of having nodes accidentally
> re-provisioned from scratch.

+1, this would be a very valuable change for the project generally.

In general i think TripleO could be a little more "UNIXy" - composed of
> smaller parts that make sense on their own, transparent to the operator,
> more modular and modifiable, and in effect more receptive of how varying
> are the real world deployment environments (various Neutron and Cinder
> plugins, Keystone backends, composable set of services, custom node types
> etc.).
> Workflow persisted in a data-like fashion is probably more modifiable by
> the operator than Python code of a REST API. We've seen hard assumptions
> cause problems in the past. (Think the unoverridable CLI parameters issue
> we used to have, and how we had to move to a model of "CLI provides its
> values, but you can always override them or provide additional ones with an
> environment file if needed", which we now use extensively). I'm a bit
> concerned that building a new REST API on top of everything would impose
> new rigid assumptions that could cause more harm than good in the end. I'm
> concerned that it would be usable only for very basic deployments, while
> the world of real deployments has its own pace and requirements not fitting
> the "best practices" as defined by the API, having to bypass the API far
> too often and slowly pushing it into abandonment over time.
> My mind is probably biased towards the the operator feedback that
> resonated with me the most, i've heard pro-blackbox opinions too (though
> not from operators yet IIRC). So take what i wrote just as my 2 cents, but
> i think it's necessary to consider the above issues when thinking about the
> implications of building a TripleO API.
> Regarding the non-workflow kind of features we need for empowering GUI,
> wouldn't those be useful for normal (tenant) Heat stack deployments in the
> overcloud too? It sounds to me that features like "driving a Heat stack
> deployment with the same powers from CLI or GUI", "updating a CLI-created
> stack from GUI and vice versa", "understanding/parsing what are the
> configuration options of my Heat templates" are all features that are not
> specific to TripleO, and could be useful for tenant Heat stacks too. So
> perhaps these should be implemented in Heat? If that can't happen fast
> enough, then we might need to put some workarounds in place for now, but it
> might be better if we didn't advertise those as a stable solution.
> Jirka
>> Mainn
>>> Dan
>>>> Mainn
>>>>> I think the correct attitude is to simply look at the problem
>>>>>> we're
>>>>>> trying to solve and find the correct architecture.  For these
>>>>>> get/set
>>>>>> methods that the API needs, it's pretty simple: storage -> some
>>>>>> logic ->
>>>>>> a REST API.  Adding a workflow engine on top of that is unneeded,
>>>>>> and I
>>>>>> believe that means it's an incorrect solution.
>>>>> What may help is if we can work through the proposed API spec, and
>>>>> identify which calls can reasonably be considered workflows vs
>>>>> those where
>>>>> it's really just proxying an API call with some logic?
>>>>> When we have a defined list of "not workflow" API requirements,
>>>>> it'll
>>>>> probably be much easier to rationalize over the value of a bespoke
>>>>> API vs
>>>>> mistral?
>>>>> Steve
>>>>> ___________________________________________________________________
>>>>> _______
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsu
>>>>> bscribe
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> _____________________________________________________________________
>>>> _____
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubs
>>>> cribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160120/fe67ea73/attachment.html>

More information about the OpenStack-dev mailing list