[openstack-dev] [TripleO] Should we have a TripleO API, or simply use Mistral?

Dougal Matthews dougal at redhat.com
Thu Jan 21 14:55:26 UTC 2016


On 21 January 2016 at 14:46, Dougal Matthews <dougal at redhat.com> wrote:

>
>
> On 20 January 2016 at 20:05, Tzu-Mainn Chen <tzumainn at redhat.com> wrote:
>
>> ----- Original Message -----
>> > On 18.1.2016 19:49, Tzu-Mainn Chen wrote:
>> > > ----- Original Message -----
>> > >> On Thu, 2016-01-14 at 16:04 -0500, Tzu-Mainn Chen wrote:
>> > >>>
>> > >>> ----- Original Message -----
>> > >>>> On Wed, Jan 13, 2016 at 04:41:28AM -0500, Tzu-Mainn Chen wrote:
>> > >>>>> Hey all,
>> > >>>>>
>> > >>>>> I realize now from the title of the other TripleO/Mistral thread
>> > >>>>> [1] that
>> > >>>>> the discussion there may have gotten confused.  I think using
>> > >>>>> Mistral for
>> > >>>>> TripleO processes that are obviously workflows - stack
>> > >>>>> deployment, node
>> > >>>>> registration - makes perfect sense.  That thread is exploring
>> > >>>>> practicalities
>> > >>>>> for doing that, and I think that's great work.
>> > >>>>>
>> > >>>>> What I inappropriately started to address in that thread was a
>> > >>>>> somewhat
>> > >>>>> orthogonal point that Dan asked in his original email, namely:
>> > >>>>>
>> > >>>>> "what it might look like if we were to use Mistral as a
>> > >>>>> replacement for the
>> > >>>>> TripleO API entirely"
>> > >>>>>
>> > >>>>> I'd like to create this thread to talk about that; more of a
>> > >>>>> 'should we'
>> > >>>>> than 'can we'.  And to do that, I want to indulge in a thought
>> > >>>>> exercise
>> > >>>>> stemming from an IRC discussion with Dan and others.  All, please
>> > >>>>> correct
>> > >>>>> me
>> > >>>>> if I've misstated anything.
>> > >>>>>
>> > >>>>> The IRC discussion revolved around one use case: deploying a Heat
>> > >>>>> stack
>> > >>>>> directly from a Swift container.  With an updated patch, the Heat
>> > >>>>> CLI can
>> > >>>>> support this functionality natively.  Then we don't need a
>> > >>>>> TripleO API; we
>> > >>>>> can use Mistral to access that functionality, and we're done,
>> > >>>>> with no need
>> > >>>>> for additional code within TripleO.  And, as I understand it,
>> > >>>>> that's the
>> > >>>>> true motivation for using Mistral instead of a TripleO API:
>> > >>>>> avoiding custom
>> > >>>>> code within TripleO.
>> > >>>>>
>> > >>>>> That's definitely a worthy goal... except from my perspective,
>> > >>>>> the story
>> > >>>>> doesn't quite end there.  A GUI needs additional functionality,
>> > >>>>> which boils
>> > >>>>> down to: understanding the Heat deployment templates in order to
>> > >>>>> provide
>> > >>>>> options for a user; and persisting those options within a Heat
>> > >>>>> environment
>> > >>>>> file.
>> > >>>>>
>> > >>>>> Right away I think we hit a problem.  Where does the code for
>> > >>>>> 'understanding
>> > >>>>> options' go?  Much of that understanding comes from the
>> > >>>>> capabilities map
>> > >>>>> in tripleo-heat-templates [2]; it would make sense to me that
>> > >>>>> responsibility
>> > >>>>> for that would fall to a TripleO library.
>> > >>>>>
>> > >>>>> Still, perhaps we can limit the amount of TripleO code.  So to
>> > >>>>> give API
>> > >>>>> access to 'getDeploymentOptions', we can create a Mistral
>> > >>>>> workflow.
>> > >>>>>
>> > >>>>>    Retrieve Heat templates from Swift -> Parse capabilities map
>> > >>>>>
>> > >>>>> Which is fine-ish, except from an architectural perspective
>> > >>>>> 'getDeploymentOptions' violates the abstraction layer between
>> > >>>>> storage and
>> > >>>>> business logic, a problem that is compounded because
>> > >>>>> 'getDeploymentOptions'
>> > >>>>> is not the only functionality that accesses the Heat templates
>> > >>>>> and needs
>> > >>>>> exposure through an API.  And, as has been discussed on a
>> > >>>>> separate TripleO
>> > >>>>> thread, we're not even sure Swift is sufficient for our needs;
>> > >>>>> one possible
>> > >>>>> consideration right now is allowing deployment from templates
>> > >>>>> stored in
>> > >>>>> multiple places, such as the file system or git.
>> > >>>>
>> > >>>> Actually, that whole capabilities map thing is a workaround for a
>> > >>>> missing
>> > >>>> feature in Heat, which I have proposed, but am having a hard time
>> > >>>> reaching
>> > >>>> consensus on within the Heat community:
>> > >>>>
>> > >>>> https://review.openstack.org/#/c/196656/
>> > >>>>
>> > >>>> Given that is a large part of what's anticipated to be provided by
>> > >>>> the
>> > >>>> proposed TripleO API, I'd welcome feedback and collaboration so we
>> > >>>> can move
>> > >>>> that forward, vs solving only for TripleO.
>> > >>>>
>> > >>>>> Are we going to have duplicate 'getDeploymentOptions' workflows
>> > >>>>> for each
>> > >>>>> storage mechanism?  If we consolidate the storage code within a
>> > >>>>> TripleO
>> > >>>>> library, do we really need a *workflow* to call a single
>> > >>>>> function?  Is a
>> > >>>>> thin TripleO API that contains no additional business logic
>> > >>>>> really so bad
>> > >>>>> at that point?
>> > >>>>
>> > >>>> Actually, this is an argument for making the validation part of the
>> > >>>> deployment a workflow - then the interface with the storage
>> > >>>> mechanism
>> > >>>> becomes more easily pluggable vs baked into an opaque-to-operators
>> > >>>> API.
>> > >>>>
>> > >>>> E.g, in the long term, imagine the capabilities feature exists in
>> > >>>> Heat, you
>> > >>>> then have a pre-deployment workflow that looks something like:
>> > >>>>
>> > >>>> 1. Retrieve golden templates from a template store
>> > >>>> 2. Pass templates to Heat, get capabilities map which defines
>> > >>>> features user
>> > >>>> must/may select.
>> > >>>> 3. Prompt user for input to select required capabilites
>> > >>>> 4. Pass user input to Heat, validate the configuration, get a
>> > >>>> mapping of
>> > >>>> required options for the selected capabilities (nested validation)
>> > >>>> 5. Push the validated pieces ("plan" in TripleO API terminology) to
>> > >>>> a
>> > >>>> template store
>> > >>>>
>> > >>>> This is a pre-deployment validation workflow, and it's a superset
>> > >>>> of the
>> > >>>> getDeploymentOptions feature you refer to.
>> > >>>>
>> > >>>> Historically, TripleO has had a major gap wrt workflow, meaning
>> > >>>> that we've
>> > >>>> always implemented it either via shell scripts (tripleo-incubator)
>> > >>>> or
>> > >>>> python code (tripleo-common/tripleo-client, potentially TripleO
>> > >>>> API).
>> > >>>>
>> > >>>> So I think what Dan is exploring is, how do we avoid reimplementing
>> > >>>> a
>> > >>>> workflow engine, when a project exists which already does that.
>> > >>>>
>> > >>>>> My gut reaction is to say that proposing Mistral in place of a
>> > >>>>> TripleO API
>> > >>>>> is to look at the engineering concerns from the wrong
>> > >>>>> direction.  The
>> > >>>>> Mistral alternative comes from a desire to limit custom TripleO
>> > >>>>> code at all
>> > >>>>> costs.  I think that is an extremely dangerous attitude that
>> > >>>>> leads to
>> > >>>>> compromises and workarounds that will quickly lead to a shaky
>> > >>>>> code base
>> > >>>>> full of design flaws that make it difficult to implement or
>> > >>>>> extend any
>> > >>>>> functionality cleanly.
>> > >>>>
>> > >>>> I think it's not about limiting TripleO code at all costs, it's
>> > >>>> about
>> > >>>> learning from past mistakes, where long-term TripleO specific
>> > >>>> workarounds
>> > >>>> for gaps in other projects have become serious technical debt.
>> > >>>>
>> > >>>> For example, the old merge.py approach to template composition was
>> > >>>> a
>> > >>>> workaround for missing heat features, then Tuskar was another
>> > >>>> workaround
>> > >>>> (arguably) for missing heat features, and now we're again proposing
>> > >>>> a
>> > >>>> long-term workaround for some missing heat features, some of which
>> > >>>> are
>> > >>>> already proposed (referring to the API for capabilities
>> > >>>> resolution).
>> > >>>>
>> > >>>
>> > >>> This is an important point, thanks for bringing it up!
>> > >>>
>> > >>> I think that I might have a different understanding of the lessons
>> to
>> > >>> be
>> > >>> learned from Tuskar's limitations.  There were actually two issues
>> > >>> that
>> > >>> arose.  The first was that Tuskar was far too specific in how it
>> > >>> tried to
>> > >>> manipulated Heat pieces.  The second - and more serious, from my
>> > >>> point of
>> > >>> view - was that there literally was no way for an API-based GUI to
>> > >>> perform the tasks it needed to in order to do the correct
>> > >>> manipulation
>> > >>> (environment selection), because there was no Heat API in place for
>> > >>> doing
>> > >>> so.
>> > >>>
>> > >>> My takeaway from the first issue was that any potential TripleO API
>> > >>> in
>> > >>> the future needed to be very low-level, a light skimming on top of
>> > >>> the
>> > >>> OpenStack services it uses.  The plan creation process that the
>> > >>> tripleo-common library spec describes is that: it's just a couple of
>> > >>> methods designed to allow a user to create an environment file,
>> which
>> > >>> can then be used for deploying the overcloud.
>> > >>>
>> > >>> My takeaway from the second issue was a bit more complicated.  A
>> > >>> required feature was missing, and although the proper functionality
>> > >>> needed to enable it in Heat was identified, it was unclear (and
>> > >>> remains
>> > >>> unclear) whether that feature truly belonged in Heat.  What does a
>> > >>> GUI
>> > >>> do then?  The GUI could take a cycle off, which is essentially what
>> > >>> happened here; I don't think that's a reasonable solution.  We could
>> > >>> hope that we arrive at a 100% foolproof and immutable deployment
>> > >>> solution
>> > >>> in the future, arriving at a point where no new features would ever
>> > >>> be
>> > >>> needed; I don't think that's a practical hope.
>> > >>>
>> > >>> The third solution that came to mind was the idea of creating the
>> > >>> TripleO API.  It gives us a place to add in missing features if
>> > >>> needed.
>> > >>> And I think it also gives us a useful layer of indirection.  The
>> > >>> consumers of TripleO want a stable API, so that a new release
>> doesn't
>> > >>> force them to do a massive update of their code; the TripleO API
>> > >>> would
>> > >>> provide that, allowing us to switch code behind the scenes (say, if
>> > >>> the capabilities feature lands in Heat).
>> > >>
>> > >> I think the above example would work equally well in a generic
>> workflow
>> > >> sort of tool. You could image that the inputs to the workflow remain
>> > >> the same... but rather than running our own code in some interim step
>> > >> we simply call Heat directly for the capabilities map feature.
>> > >>
>> > >> So regardless of whether we build our own API or use a generic
>> workflow
>> > >> too I think we still have what I would call a "release valve" to let
>> us
>> > >> inject some custom code (actions) into the workflow. Like we
>> discussed
>> > >> last week on IRC I would like to minimize the number of custom
>> actions
>> > >> we have (with an eye towards things living in the upstream OpenStack
>> > >> projects) but it is fine to do this either way and would work equally
>> > >> well w/ Mistral and TripleO API.
>> > >>
>> > >>>
>> > >>> I think I kinda view TripleO as a 'best practices' project.  Using
>> > >>> OpenStack is a confusing experience, with a million different
>> options
>> > >>> and choices to make.  TripleO provides users with an excellent
>> guide.
>> > >>> But the problem is that best practices change, and I think that
>> > >>> perceived instability is dangerous for adoption of TripleO.
>> > >>>
>> > >>> So having a TripleO library and its associated API be a 'best
>> > >>> practices'
>> > >>> library makes sense to me.  It gives consumers a stable platform
>> upon
>> > >>> which to use TripleO, while allowing us to be flexible behind the
>> > >>> scenes.
>> > >>> The 'best practice' for Heat capabilities right now is a workaround,
>> > >>> because it hasn't been judged to be suitable to go into Heat itself.
>> > >>> If that changes, we get to shift as well - and all of these changes
>> > >>> are
>> > >>> invisible to the API consumer.
>> > >>
>> > >>
>> > >> I mentioned this in my "Driving workflows with Mistral" thread but
>> with
>> > >> regards to stability I view say Heat's v1 API or Mistral's v2 API as
>> > >> both being way more stable that what we could ever achieve with
>> TripleO
>> > >> API. The real trick to API stability with something like Heat or
>> > >> Mistral is how we manage the inputs and outputs to Stacks and
>> Workflows
>> > >> themselves. So long as we are mindful of this I can't image an end
>> user
>> > >> (say a GUI writer or whoever) would really care whether they POST to
>> > >> Mistral or something we've created. The nice thing about using other
>> > >> OpenStack projects like Heat or Mistral is that they very likely have
>> > >> better community and documentation around these things as well that
>> we
>> > >> would ever have.
>> > >>
>> > >> The more I look at using Mistral for some of the cases that have been
>> > >> brought up the more it seems to make sense for a lot of the workflows
>> > >> we need. I don't believe we can achieve better stability by creating
>> > >> what sounds more and more like a shim/proxy API rather than using the
>> > >> versioned API's that OpenStack already provides.
>> > >>
>> > >> There may be some corner cases where a "GUI helper" API comes into
>> play
>> > >> for some sort of caching or something. I'm not blocking anyone from
>> > >> creating these sorts of features if they need them. And again if it
>> is
>> > >> something that could be added to an upstream OpenStack project like
>> > >> Heat or Mistral I would look there first. So perhaps Zaqar for
>> > >> websockets instead of rolling our own, this sort of thing.
>> > >>
>> > >> What does concern me is that we are overstating what TripleO API
>> should
>> > >> actually contain should we choose to pursue it. Initially it was
>> > >> positioned as the "TripleO workflow API". I think we now agree that
>> we
>> > >> probably shouldn't put all of our workflows behind it. So if our
>> stance
>> > >> has changed would it make sense to compile a new list of what we
>> > >> believe belongs behind our own TripleO API vs. what we consider
>> > >> workflows.
>> > >>
>> > >
>> > >
>> > > I wonder if it would be helpful to get operator feedback here - show
>> them
>> > >   the advantages/disadvantages of both options and to get a sense of
>> what
>> > > might be useful/necessary for them to use TripleO effectively?
>> >
>> > (I'm going off on a tangent a bit, but please bear with me, i'm using
>> > all that to support the point in the end. The implications of building a
>> > TripleO API touch on various topics.)
>> >
>> > Yes i think we should gather operator feedback. We already got some, but
>> > we should gather more whenever possible.
>> >
>> > One kind of (negative) feedback i've heard is that overcloud management
>> > is too much of a "blackbox" compared to what operators are used to. The
>> > feedback i recall was that it's hard to tell what is going to happen
>> > when running an overcloud stack update, and that we cannot re-execute
>> > the software config management independently.
>> >
>> > Building another umbrella API to rule the already largely umbrella-like
>> > deployment process (think what all responsibilities lie within the
>> > tripleo-heat-templates codebase, and within the single 'overcloud' Heat
>> > stack) would probably make matters more blackboxy and go further in the
>> > direction of "i feel like i don't know what's happening to my cloud when
>> > i use the management tool".
>> >
>> > What i think could improve the situation for operators is trying to
>> > chunk up what we already have into smaller, more independently operable
>> > parts. The split-stack approach already discussed on the TripleO meeting
>> > and on #tripleo could help with this. Essentially separating our
>> > hardware management from our software config management. Being able to
>> > re-apply software configuration without being afraid of having nodes
>> > accidentally re-provisioned from scratch.
>> >
>> > In general i think TripleO could be a little more "UNIXy" - composed of
>> > smaller parts that make sense on their own, transparent to the operator,
>> > more modular and modifiable, and in effect more receptive of how varying
>> > are the real world deployment environments (various Neutron and Cinder
>> > plugins, Keystone backends, composable set of services, custom node
>> > types etc.).
>> >
>> > Workflow persisted in a data-like fashion is probably more modifiable by
>> > the operator than Python code of a REST API. We've seen hard assumptions
>> > cause problems in the past. (Think the unoverridable CLI parameters
>> > issue we used to have, and how we had to move to a model of "CLI
>> > provides its values, but you can always override them or provide
>> > additional ones with an environment file if needed", which we now use
>> > extensively). I'm a bit concerned that building a new REST API on top of
>> > everything would impose new rigid assumptions that could cause more harm
>> > than good in the end. I'm concerned that it would be usable only for
>> > very basic deployments, while the world of real deployments has its own
>> > pace and requirements not fitting the "best practices" as defined by the
>> > API, having to bypass the API far too often and slowly pushing it into
>> > abandonment over time.
>> >
>> > My mind is probably biased towards the the operator feedback that
>> > resonated with me the most, i've heard pro-blackbox opinions too (though
>> > not from operators yet IIRC). So take what i wrote just as my 2 cents,
>> > but i think it's necessary to consider the above issues when thinking
>> > about the implications of building a TripleO API.
>> >
>>
>> Those are completely valid points, thanks for bringing them up!
>>
>> I think I should step back a bit and express my views from a different
>> perspective (which I probably should have done far earlier).  I've been
>> working with a GUI application that deploys OpenStack using TripleO.  The
>> deprecation of Tuskar was somewhat disruptive, but it was understood that
>> there are workarounds while we work towards a more permanent solution.
>>
>> What this application wants from TripleO is a set of APIs that provides
>> confidence that they can use TripleO without the risk of having to
>> fundamentally change their codebase if TripleO changes.  TripleO needs
>> to guarantee support for its deployment practices and to have a formal
>> deprecation process if it moves to a different architecture.
>>
>> The proposed TripleO API spec differs from Tuskar in that (in my view)
>> it's far lower level.  There are operations to get/set various types of
>> deployment parameters; there's an operation to deploy the result.  None
>> of that precludes us from, say, eventually adding options to allow the
>> API to distinguish between hardware management and software config, or
>> to add an operation to apply the software config.
>>
>> Another reason this is important is it addresses your point below:
>>
>> >
>> > Regarding the non-workflow kind of features we need for empowering GUI,
>> > wouldn't those be useful for normal (tenant) Heat stack deployments in
>> > the overcloud too? It sounds to me that features like "driving a Heat
>> > stack deployment with the same powers from CLI or GUI", "updating a
>> > CLI-created stack from GUI and vice versa", "understanding/parsing what
>> > are the configuration options of my Heat templates" are all features
>> > that are not specific to TripleO, and could be useful for tenant Heat
>> > stacks too. So perhaps these should be implemented in Heat? If that
>> > can't happen fast enough, then we might need to put some workarounds in
>> > place for now, but it might be better if we didn't advertise those as a
>> > stable solution.
>> >
>>
>> I think TripleO benefits from controlling the access to these operations,
>> simply because it allows the underlying TripleO architecture change
>> without forcing integrators to change all their API calls.  For example,
>> let's say we create a TripleO API that gets/sets parameters in the form
>> of a Heat environment file, and then deploys through Heat.  If we want
>> to move to having the deployment driven through a Mistral workflow, we
>> can change the underlying code - write the parameters into a Mistral
>> environment file, drive the deployment through Mistral - without
>> affecting the outward facing API.
>>
>
> I think this is a great point, and I think it has a ton of value from the
> point of
> view for an external consumer. For the TripleO CLI and TripleO API, Mistral
> would likely be fine as CI etc. would make sure they don't break each
> other.
> This guarantee wouldn't exist for external consumers, granted we would
> have the same potential issue with an API (regressions happen). However,
> I think trying to put an API infront of Mistral is an extra layer of
> abstraction
> that may not be needed.
>
> I would like to see us use Mistral directly and as those workflows mature,
> if it proves to be a difficult entry point for users we can create an API
> that
> creates a clearer and nicer interface to the world of TripleO. This might
> not
> create the ideal initial situation but if feels like the right way to go
> about
> this, incrementally adding an interface as needed rather adding an API
> just in-case.
>
> Let's get the Mistral interface correct and see where they leaves us.
>

Gah, I git sent a bit early. I said almost everything I wanted to add, but
I just
wanted to say that I believe we can offer a stable API via Mistral just in
the
way we can with a custom API. We just need to clearly define and document
each of these points of contact. Then CI in TripleO hould help us avoid any
regressions or unplanned changes.

It could be argued that the Mistral interface would be more stable as it
already
exists and has been proven. We would be using this standard interface.


>
>
>>
>>
>> Mainn
>>
>>
>> >
>> > Jirka
>> >
>> > >
>> > > Mainn
>> > >
>> > >
>> > >>
>> > >> Dan
>> > >>
>> > >>>
>> > >>> Mainn
>> > >>>
>> > >>>
>> > >>>>
>> > >>>>> I think the correct attitude is to simply look at the problem
>> > >>>>> we're
>> > >>>>> trying to solve and find the correct architecture.  For these
>> > >>>>> get/set
>> > >>>>> methods that the API needs, it's pretty simple: storage -> some
>> > >>>>> logic ->
>> > >>>>> a REST API.  Adding a workflow engine on top of that is unneeded,
>> > >>>>> and I
>> > >>>>> believe that means it's an incorrect solution.
>> > >>>>
>> > >>>> What may help is if we can work through the proposed API spec, and
>> > >>>> identify which calls can reasonably be considered workflows vs
>> > >>>> those where
>> > >>>> it's really just proxying an API call with some logic?
>> > >>>>
>> > >>>> When we have a defined list of "not workflow" API requirements,
>> > >>>> it'll
>> > >>>> probably be much easier to rationalize over the value of a bespoke
>> > >>>> API vs
>> > >>>> mistral?
>> > >>>>
>> > >>>>
>> > >>>> Steve
>> > >>>>
>> > >>>> ___________________________________________________________________
>> > >>>> _______
>> > >>>> OpenStack Development Mailing List (not for usage questions)
>> > >>>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsu
>> > >>>> bscribe
>> > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> > >>>>
>> > >>>
>> > >>>
>> _____________________________________________________________________
>> > >>> _____
>> > >>> OpenStack Development Mailing List (not for usage questions)
>> > >>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubs
>> > >>> cribe
>> > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> > >>
>> > >>
>> __________________________________________________________________________
>> > >> OpenStack Development Mailing List (not for usage questions)
>> > >> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> > >>
>> > >
>> > >
>> __________________________________________________________________________
>> > > OpenStack Development Mailing List (not for usage questions)
>> > > Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> > >
>> >
>> >
>> >
>> __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160121/deb47a6a/attachment.html>


More information about the OpenStack-dev mailing list