[openstack-dev] [TripleO] Summit session wrapup

Robert Collins robertc at robertcollins.net
Thu Nov 28 05:41:23 UTC 2013


Hey, I realise I've done a sort of point-bypoint thing below - sorry.
Let me say that I'm glad you're focused on what will help users, and
their needs - I am too. Hopefully we can figure out why we have
different opinions about what things are key, and/or how we can get
data to better understand our potential users.


On 28 November 2013 02:39, Jaromir Coufal <jcoufal at redhat.com> wrote:

> Important point here is, that we agree on starting with very basics - grow
> then. Which is great.
>
> The whole deployment workflow (not just UI) is all about user experience
> which is built on top of TripleO's approach. Here I see two important
> factors:
> - There are users who are having some needs and expectations.

Certainly. Do we have Personas for those people? (And have we done any
validation of them?)

> - There is underlying concept of TripleO, which we are using for
> implementing features which are satisfying those needs.

mmm, so the technical aspect of TripleO is about setting up a virtuous
circle: where improvements in deploying cluster software via OpenStack
makes deploying OpenStack better, and those of us working on deploying
OpenStack will make deploying cluster software via OpenStack better in
general, as part of solving 'deploying OpenStack' in a nice way.

> We are circling around and trying to approach the problem from wrong end -
> which is implementation point of view (how to avoid own scheduling).
>
> Let's try get out of the box and start with thinking about our audience
> first - what they expect, what they need. Then we go back, put our
> implementation thinking hat on and find out how we are going to re-use
> OpenStack components to achieve our goals. In the end we have detailed plan.

Certainly, +1.

> === Users ===
>
> I would like to start with our targeted audience first - without milestones,
> without implementation details.
>
> I think here is the main point where I disagree and which leads to different
> approaches. I don't think, that user of TripleO cares only about deploying
> infrastructure without any knowledge where the things go. This is overcloud
> user's approach - 'I want VM and I don't care where it runs'. Those are
> self-service users / cloud users. I know we are OpenStack on OpenStack, but
> we shouldn't go that far that we expect same behavior from undercloud users.
> I can tell you various examples of why the operator will care about where
> the image goes and what runs on specific node.

This may be where we disagree indeed :). Wearing my sysadmin hat ( a
little dusty, but never really goes away :P) - I can tell you I spent
a lot of time worrying about what went on what machine. But it was
never actually what I was paid to do.

What I was paid to do was to deliver infrastructure and services to
the business. Everything that we could automate, that we could
describe with policy and still get robust, reliable results - we did.
It's how one runs many hundred machines with an ops team of 2.

Planning around failure domains for example, is tedious work; it's
needed at a purchasing level - you need to decide if you're buying
three datacentres or one datacentre with internal redundancy, but once
thats decided the actual mechanics of ensure that each HA service is
spread across the (three datacentres) or (three separate zones in the
one DC) is not interesting. So - I'm sure that many sysadmins do
manually assign work to machines to ensure a good result from
performance or HA concerns, but thats out of necessity, not desire.

> One quick example:
> I have three racks of homogenous hardware and I want to design it the way so
> that I have one control node in each, 3 storage nodes and the rest compute.
> With that smart deployment, I'll never know what my rack contains in the
> end. But if I have control over stuff, I can say that this node is
> controller, those three are storage and those are compute - I am happy from
> the very beginning.

Why does that layout make you happy? What is it about that setup where
things will work better for you? Note that in the absence of a
sophisticated scheduler you'll have some volumes with redundancy of 3
end up all in one rack: you won't get rack-can-fail safety on the
delivered cloud workloads (I mention this as one attempt to understand
why knowing there is a control node / 3 storage /rest compute in each
rack makes you happy).

> Our targeted audience are sysadmins, operators. They hate 'magics'. They
> want to have control over things which they are doing. If we put in front of
> them workflow, where they click one button and they get cloud installed,
> they will get horrified.

I don't think this is a good characterisation of the sysadmin /
operator mindset. They - like anyone don't like surprises, and they
often care intensely about delivering services well, with high
performance and high availability. Tools that help them do that are
appreciated, tools that are flaky - which a lot of
abstract-all-the-details tools seem to be - get a bad rap in sysadmin
circles.

> That's why I am very sure and convinced that we need to have ability for
> user to have control over stuff. What node is having what role. We can be
> smart, suggest and advice. But not hiding this functionality from user.
> Otherwise, I am afraid that we can fail.

I think having that degree of control is failure. Our CloudOS team has
considerable experience now in deploying clouds using a high-touch
system like you describe - and they are utterly convinced that it
doesn't scale. Even at 20 nodes it is super tedious, and beyond that
it's ridiculous.

> Furthermore, if we put lots of restrictions (like homogenous hardware) in
> front of users from the very beginning, we are discouraging people from
> using TripleO-UI. We are young project and trying to hit as broad audience
> as possible. If we do flexible enough approach to get large audience
> interested, solve their problems, we will get more feedback, we will get
> early adopters, we will get more contributors, etc.

Flexibilty comes with a cost. Right now we have a large audience
interested in what we have, but we're delivering two separate things:
we have a functional sysadminny interface with command line scripts
and heat templates - , and we have a GUI where we can offer a better
interface which the tuskar folk are building up. I agree that
homogeneous hardware isn't a viable long term constraint. But if we
insist on fixing that issue first, we sacrifice our ability to learn
about the usefulness of a simple, straight forward interface. We'll be
doing a bunch of work - regardless of implementation - to deal with
heterogeneity, when we could be bringing Swift and Cinder up to
production readiness - which IMO will get many more folk onboard for
adoption.

> First, let's help cloud operator, who is having some nodes and wants to
> deploy OpenStack on them. He wants to have control which node is controller,
> which node is compute or storage. Then we can get smarter and guide.

Folk that want to manually install openstack on a couple of machines
can already do so : we don't change the game for them by replacing a
manual system with a manual system. My vision is that we should
deliver something significantly better!

> === Milestones ===
>
> Based on different user behavior I am talking about, I suggest different
> milestones:
...

So, I have a suggestion. Lets create a set of all the things we want
in the product eventually.

https://etherpad.openstack.org/p/tripleo-feature-map

>From there we can assess for each thing several things:
cost - estimated cost of 'ok'(*) implementation - 0: expensive-
multiple cycles, 9: cheap
benefit(us) - estimated benefit to design learning by having a
functional implementation - 0: learn nothing, 9: learn lots
benefit(users) - e.g. estimated increase in # of users for which
TripleO will satisfy their needs (as part of a holistic install) - 0:
minimal increase, 9: huge increase

>From there we can draw a cube: things that are cheap, we learn a lot,
and users benefit a lot are no brainers :) Things that are expensive,
we don't learn a lot and users don't benefit much are clearly things
we don't want to do right now:

cost  b-us   b-users   do-when ?
0        0        0            never?
9        9        9            right now
5        5        5            sometime in the middle
but more interesting are combinations like:
0        9        9            start now as a background task?
9        2        2            Do if we have nothing better
9        0        9            right now
9        9        0            also right now

So I dunno if this is a good idea - it's just an attempt to visualise
the tradeoffs in a way that we can be clear what we're saying is good
about a specific feature [think of it as a variation on planning
poker].

(*): I mean an implementation we could live with for a while, vs
whatever the ideal might be.

>
> === Implementation ===
>
> Above mentioned approach shouldn't lead to reimplementing scheduler. We can
> still use nova-scheduler, but we can take advantage of extra params (like
> unique identifier), so that we specify more concretely what goes where.

That is reimplementing the scheduler. In this case it's forcing
sysadmins to be the scheduler, which is a waste of their time.

> More details should follow here - how to achieve above mentioned goals, like
> what should go through heat, what should go through nova, ironic, etc.
>
> But first, let's agree on approach and goals.

Totally agree!

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list