[openstack-dev] [TripleO] Should we have a TripleO API, or simply use Mistral?

Ben Nemec openstack at nemebean.com
Fri Jan 22 19:30:46 UTC 2016

On 01/22/2016 12:30 PM, Ryan Brady wrote:
> On Fri, Jan 22, 2016 at 12:24 PM, Ben Nemec <openstack at nemebean.com
> <mailto:openstack at nemebean.com>> wrote:
>     So I haven't weighed in on this yet, in part because I was on vacation
>     when it was first proposed and missed a lot of the initial discussion,
>     and also because I wanted to take some time to order my thoughts on it.
>      Also because my initial reaction...was not conducive to calm and
>     rational discussion. ;-)
>     The tldr is that I don't like it.  To explain why, I'm going to make a
>     list (everyone loves lists, right? Top $NUMBER reasons we should stop
>     expecting other people to write our API for us):
>     1) We've been down this road before.  Except last time it was with Heat.
>      I'm being somewhat tongue-in-cheek here, but expecting a general
>     service to provide us a user-friendly API for our specific use case just
>     doesn't make sense to me.
> The UI for is the CLI or the GUI right?  The typical method for
> interacting with TripleO seems defined that way.  If we're worried about
> it being friendly enough to integrate with I'm sure it could be easily
> explained in a doc.  Underneath the difference is calling to
> :8989/v2/executions instead of :8585/v1/deploy.  I'm not convinced
> there's a large enough difference here to say that one is more
> user-friendly or not than the other.

The difference is in having parameters passed via a well-defined
JSON-based API like every other OpenStack service, or having parameters
passed via a loosely defined "API" in the Mistral YAML.  The latter is
what we're currently doing for the Heat templates, and I don't think
it's particularly good there either.  We make changes to the template
API all the time that break our consumers (I know this because I am one
of them, and I'm constantly having to rebase my nic templates because of
interface changes).

>     2) The TripleO API is not a workflow API.  I also largely missed this
>     discussion, but the TripleO API is a _Deployment_ API.  In some cases
>     there also happens to be a workflow going on behind the scenes, but
>     honestly that's not something I want our users to have to care about.
>     3) It ties us 100% to a given implementation.  If Mistral proves to be a
>     poor choice for some reason, or insufficient for a particular use case,
>     we have no alternative.  If we have an API and decide to change our
>     implementation, nobody has to know or care.  This is kind of the whole
>     point of having an API - it shields users from all the nasty
>     implementation details under the surface.
>     4) It raises the bar even further for both new deployers and developers.
>      You already need to have a pretty firm grasp of Puppet and Heat
>     templates to understand how our stuff works, not to mention a decent
>     understanding of quite a number of OpenStack services.
> And an additional 14 or so active projects TripleO created, right?

It's a stupidly complex environment, yes. :-)

>     This presents a big chicken and egg problem for people new to OpenStack.
>      It's great that we're based on OpenStack and that allows people to peek
>     under the hood and do some tinkering, but it can't be required for
>     everyone.  A lot of our deployers are going to have little to no
>     OpenStack experience, and TripleO is already a daunting task for those
>     people (hell, it's daunting for people who _are_ experienced).
> The original vision statement [1] for TripleO has not changed in some
> time.  "TripleO is a program aimed at installing, upgrading and
> operating OpenStack clouds using OpenStack's own cloud facilities as the
> foundations".  I'm not disagreeing that TripleO *could* be a deployment
> API, but that's not the language used in the wiki and if the goal is an
> API then maybe this needs to be formalized to get everyone in the
> project on the same page.  Following along with the statement as it
> stands, using existing OpenStack services sounds like a mandate.  How do
> we decide as a project when something within OpenStack can be improved
> or when we just create something new?  How do we convey that
> justification to both developers and users? 

"using existing OpenStack services sounds like a mandate." Where do you
draw the line on that?  Taken to extremes, now that Fuel is big tent we
should just use that and all go home. ;-)

I think there's got to be some qualification of that statement, and I
think it comes down to using OpenStack services where it makes sense.
I'm arguing that in this case it doesn't.

There's another important part of this that I think we also need to
discuss, which is the Red Hat acquisition of Ansible.  While it's true
that today Ansible doesn't have an open API service, it seems reasonable
(though I _do not_ have any inside information about this) to assume
that Ansible Tower will be open sourced in the near future.  At that
point, what purpose does Mistral serve?  Ansible knows how to talk to
OpenStack, and it has a way bigger acceptance rate with the operator
community.  It seems to me that there's a very real possibility that at
some point not too far in the future Mistral will be redundant.
Basically all it's providing right now is an OpenStack-specific API over
the same sort of thing Ansible does, and if/when the Ansible API is
opened up then what is the point of Mistral?

I realize I'm making some predictions here, but I think they're
well-grounded predictions, and it also goes to my point that the way we
implement workflows is an implementation detail.  Today we have Python,
tomorrow we could have Mistral, and in two years we might have Ansible.
 Should our users have to care about that?

>     5) What does reimplementing all of our tested, well-understood Python
>     into a new YAML format gain us?  This is maybe the biggest thing I'm
>     missing from this whole discussion.  
> So workflows are in yaml, but the actions are still in python.  The
> actions have an __init__ and a run method which looks testable to me.

I guess I was looking at Dan's introspection workflow, which seemed to
be using Mistral built-in things, but I could be mistaken.

In any case, if we're just using Mistral to drive our custom Python that
still doesn't seem like a net win to me.

>     We lose a bunch of things (ease of
>     transition from other Python projects, excellent existing testing
>     framework, etc.), but what are we actually gaining other than the
>     ability to say that we use N + 1 OpenStack services?  Because we're way
>     past the point where "It's OpenStack deploying OpenStack" is sufficient
>     reason for people to pay attention to us.  We need less "Ooh, neat" and
>     more "Ooh, that's easy to use and works well."  It's still not clear to
>     me that Mistral helps in any way with the latter.
>     6) On the testing note, how do we test these workflows?  Do we know what
>     happens when step X fails?  How do we test that they handle it properly
>     in an automated and repeatable way?  In Python these are largely easy
>     questions to answer: unit tests.  How do you unit test YAML?  This is a
>     big reason I'm not even crazy about having Mistral on the back end of a
>     TripleO API.  We'd be going from code that we can test and prove works
>     in a variety of scenarios, to YAML that is tested and proven to work in
>     exactly the three scenarios we run in CI.  This is basically the same
>     situation we had with tripleo-incubator, and it was bad there too.
> If a workflow is composed of one or more actions, is it enough to test
> the actions?  When I compare that to how we test now I think it does.  

Here again I guess I'm thinking of the introspection workflow, and what
happens if, say, a single/few nodes fail introspection (this happens all
the time in the real world, FTR).  Does the entire workflow immediately
fail if one part of it fails?  I think of this case in particular
because I know at one point the Python version didn't handle it very
well.  I believe we've fixed that now, and although I don't know for
sure if a test was added I know it _should_ have been to ensure we
didn't regress that behavior in the future.  Is it even possible to test
such a scenario with a Mistral template (not a rhetorical question, if
it is I would like to know)?

Here again though, I will go back to our Heat templates.  They're a big
pile of yaml that is only tested to the extent that our end to end CI
jobs can do.  There are numerous templates that existing in t-h-t that
have no automated test coverage whatsoever, and they do break
periodically (see my previous comments about not having a very stable
template API).  I don't know how to solve that for the Heat templates,
but I would prefer to avoid getting us into a similar situation again.
I know we can test Python, and I know our test coverage of YAML things
is...lacking.  That doesn't make me feel good about adding more YAML things.

>     I dunno.  Maybe I'm too late to this party to have any impact on the
>     discussion, but I very much do not like the direction we're going and I
>     would be remiss if I didn't at least point out my concerns with it.
>     -Ben
> I don't think you're late here at all.  I think more folks should weigh
> in so we can get to the bottom of this.
> - Ryan
> [1] https://wiki.openstack.org/wiki/TripleO
>     On 01/13/2016 03:41 AM, Tzu-Mainn Chen wrote:
>     > Hey all,
>     >
>     > I realize now from the title of the other TripleO/Mistral thread
>     [1] that
>     > the discussion there may have gotten confused.  I think using
>     Mistral for
>     > TripleO processes that are obviously workflows - stack deployment,
>     node
>     > registration - makes perfect sense.  That thread is exploring
>     practicalities
>     > for doing that, and I think that's great work.
>     >
>     > What I inappropriately started to address in that thread was a
>     somewhat
>     > orthogonal point that Dan asked in his original email, namely:
>     >
>     > "what it might look like if we were to use Mistral as a
>     replacement for the
>     > TripleO API entirely"
>     >
>     > I'd like to create this thread to talk about that; more of a
>     'should we'
>     > than 'can we'.  And to do that, I want to indulge in a thought
>     exercise
>     > stemming from an IRC discussion with Dan and others.  All, please
>     correct me
>     > if I've misstated anything.
>     >
>     > The IRC discussion revolved around one use case: deploying a Heat
>     stack
>     > directly from a Swift container.  With an updated patch, the Heat
>     CLI can
>     > support this functionality natively.  Then we don't need a TripleO
>     API; we
>     > can use Mistral to access that functionality, and we're done, with
>     no need
>     > for additional code within TripleO.  And, as I understand it,
>     that's the
>     > true motivation for using Mistral instead of a TripleO API:
>     avoiding custom
>     > code within TripleO.
>     >
>     > That's definitely a worthy goal... except from my perspective, the
>     story
>     > doesn't quite end there.  A GUI needs additional functionality,
>     which boils
>     > down to: understanding the Heat deployment templates in order to
>     provide
>     > options for a user; and persisting those options within a Heat
>     environment
>     > file.
>     >
>     > Right away I think we hit a problem.  Where does the code for
>     'understanding
>     > options' go?  Much of that understanding comes from the
>     capabilities map
>     > in tripleo-heat-templates [2]; it would make sense to me that
>     responsibility
>     > for that would fall to a TripleO library.
>     >
>     > Still, perhaps we can limit the amount of TripleO code.  So to
>     give API
>     > access to 'getDeploymentOptions', we can create a Mistral workflow.
>     >
>     >   Retrieve Heat templates from Swift -> Parse capabilities map
>     >
>     > Which is fine-ish, except from an architectural perspective
>     > 'getDeploymentOptions' violates the abstraction layer between
>     storage and
>     > business logic, a problem that is compounded because
>     'getDeploymentOptions'
>     > is not the only functionality that accesses the Heat templates and
>     needs
>     > exposure through an API.  And, as has been discussed on a separate
>     TripleO
>     > thread, we're not even sure Swift is sufficient for our needs; one
>     possible
>     > consideration right now is allowing deployment from templates
>     stored in
>     > multiple places, such as the file system or git.
>     >
>     > Are we going to have duplicate 'getDeploymentOptions' workflows
>     for each
>     > storage mechanism?  If we consolidate the storage code within a
>     TripleO
>     > library, do we really need a *workflow* to call a single
>     function?  Is a
>     > thin TripleO API that contains no additional business logic really
>     so bad
>     > at that point?
>     >
>     > My gut reaction is to say that proposing Mistral in place of a
>     TripleO API
>     > is to look at the engineering concerns from the wrong direction.  The
>     > Mistral alternative comes from a desire to limit custom TripleO
>     code at all
>     > costs.  I think that is an extremely dangerous attitude that leads to
>     > compromises and workarounds that will quickly lead to a shaky code
>     base
>     > full of design flaws that make it difficult to implement or extend any
>     > functionality cleanly.
>     >
>     > I think the correct attitude is to simply look at the problem we're
>     > trying to solve and find the correct architecture.  For these get/set
>     > methods that the API needs, it's pretty simple: storage -> some
>     logic ->
>     > a REST API.  Adding a workflow engine on top of that is unneeded,
>     and I
>     > believe that means it's an incorrect solution.
>     >
>     >
>     > Thanks,
>     > Tzu-Mainn Chen
>     >
>     >
>     >
>     > [1]
>     http://lists.openstack.org/pipermail/openstack-dev/2016-January/083757.html
>     > [2]
>     https://github.com/openstack/tripleo-heat-templates/blob/master/capabilities_map.yaml
>     >
>     >
>     __________________________________________________________________________
>     > OpenStack Development Mailing List (not for usage questions)
>     > Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     >
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> -- 
> Ryan Brady
> Cloud Engineering
> rbrady at redhat.com <mailto:rbrady at redhat.com> 
> 919.890.8925

More information about the OpenStack-dev mailing list