[openstack-dev] [TripleO] Summit session wrapup

Jaromir Coufal jcoufal at redhat.com
Thu Nov 28 08:50:05 UTC 2013


On 2013/28/11 06:41, Robert Collins wrote:
> Certainly. Do we have Personas for those people? (And have we done any
> validation of them?)
We have shorter paragraph to each. But not verified by any survey, so we 
don't have very solid basis in this area right now and I believe we all 
are trying to assume at the moment.

> This may be where we disagree indeed :). Wearing my sysadmin hat ( a
> little dusty, but never really goes away :P) - I can tell you I spent
> a lot of time worrying about what went on what machine. But it was
> never actually what I was paid to do.
>
> What I was paid to do was to deliver infrastructure and services to
> the business. Everything that we could automate, that we could
> describe with policy and still get robust, reliable results - we did.
> It's how one runs many hundred machines with an ops team of 2.
>
> Planning around failure domains for example, is tedious work; it's
> needed at a purchasing level - you need to decide if you're buying
> three datacentres or one datacentre with internal redundancy, but once
> thats decided the actual mechanics of ensure that each HA service is
> spread across the (three datacentres) or (three separate zones in the
> one DC) is not interesting. So - I'm sure that many sysadmins do
> manually assign work to machines to ensure a good result from
> performance or HA concerns, but thats out of necessity, not desire.
Well, I think there is one small misunderstanding. I've never said that 
manual way should be primary workflow for us. I agree that we should 
lean toward as much automation and smartness as possible. But in the 
same time, I am adding that we need manual fallback for user to change 
that smart decision.

Primary way would be to let TripleO decide, where the stuff go. I think 
we agree here.

But I, as sysadmin, want to see the distribution of stuff before I 
deploy. And if there is some failure in the automation logic, I need to 
have possibility to change that. Not from scratch, but do the change in 
suggested distribution. There always should be way to do that manually. 
Let's imagine that TripleO will by some mistake or intentionally 
distribute nodes across my datacenter wrong (wrong for me, not 
necessarily for somebody else). What would I do? Would I let TripleO to 
deploy it anyway? No. I will not use TripleO. But If there is something 
what I need to change and I have a way to do that, I will keep with 
TripleO, because it allows me to satisfy all I need.

We can be smart, but we can't be the smartest and see all reasons of all 
users.

> Why does that layout make you happy? What is it about that setup where
> things will work better for you? Note that in the absence of a
> sophisticated scheduler you'll have some volumes with redundancy of 3
> end up all in one rack: you won't get rack-can-fail safety on the
> delivered cloud workloads (I mention this as one attempt to understand
> why knowing there is a control node / 3 storage /rest compute in each
> rack makes you happy).
It doesn't have to make me happy, but somebody else might have strong 
reasoning for that (or any other setup which we didn't cover). We don't 
have to know it, but why can't we allow him to do this?

One more time, I want to stress this out - I am not fighting for absence 
of sophisticated scheduler, I am fighting for allowing user to control 
the stuff if he wants/needs to.

> I think having that degree of control is failure. Our CloudOS team has
> considerable experience now in deploying clouds using a high-touch
> system like you describe - and they are utterly convinced that it
> doesn't scale. Even at 20 nodes it is super tedious, and beyond that
> it's ridiculous.
Right. And are they convinced that automated tool will do the best job 
for them? Are they trusting them so strongly, so that they would deploy 
their whole datacenter without checking the correct distribution? Would 
they say - OK I said I want 50 compute, 10 block storage, 3 control. As 
long as it will work, I don't care, be smart, do it for me.

It all depends on the GUI design. If we design it well enough, so that 
we allow user to do quick bulk actions, even manual distribution can be 
easy. Even for 100 nodes... or more.
(But I don't suggest we do that all manual.)

> Flexibilty comes with a cost. Right now we have a large audience
> interested in what we have, but we're delivering two separate things:
> we have a functional sysadminny interface with command line scripts
> and heat templates - , and we have a GUI where we can offer a better
> interface which the tuskar folk are building up. I agree that
> homogeneous hardware isn't a viable long term constraint. But if we
> insist on fixing that issue first, we sacrifice our ability to learn
> about the usefulness of a simple, straight forward interface. We'll be
> doing a bunch of work - regardless of implementation - to deal with
> heterogeneity, when we could be bringing Swift and Cinder up to
> production readiness - which IMO will get many more folk onboard for
> adoption.
I agree that there always will be some cost. I just think that we can 
reduce it.

> Folk that want to manually install openstack on a couple of machines
> can already do so : we don't change the game for them by replacing a
> manual system with a manual system. My vision is that we should
> deliver something significantly better!
We should! And we can. But I think we shouldn't deliver something, what 
will discourage people from using TripleO. Especially at the beginning - 
see user, we are doing first steps here, the distribution is not perfect 
and what you wanted, but you can do the change you need. You don't have 
to go away and come back in 6 months when we try to be smarter and 
address your case.

>> === Milestones ===
> ...
>
> So, I have a suggestion. Lets create a set of all the things we want
> in the product eventually.
>
> https://etherpad.openstack.org/p/tripleo-feature-map
>
>  From there we can assess for each thing several things:
> cost - estimated cost of 'ok'(*) implementation - 0: expensive-
> multiple cycles, 9: cheap
> benefit(us) - estimated benefit to design learning by having a
> functional implementation - 0: learn nothing, 9: learn lots
> benefit(users) - e.g. estimated increase in # of users for which
> TripleO will satisfy their needs (as part of a holistic install) - 0:
> minimal increase, 9: huge increase
>
>  From there we can draw a cube: things that are cheap, we learn a lot,
> and users benefit a lot are no brainers :) Things that are expensive,
> we don't learn a lot and users don't benefit much are clearly things
> we don't want to do right now:
>
> cost  b-us   b-users   do-when ?
> 0        0        0            never?
> 9        9        9            right now
> 5        5        5            sometime in the middle
> but more interesting are combinations like:
> 0        9        9            start now as a background task?
> 9        2        2            Do if we have nothing better
> 9        0        9            right now
> 9        9        0            also right now
>
> So I dunno if this is a good idea - it's just an attempt to visualise
> the tradeoffs in a way that we can be clear what we're saying is good
> about a specific feature [think of it as a variation on planning
> poker].
>
> (*): I mean an implementation we could live with for a while, vs
> whatever the ideal might be.
I think it might help.

The thing is, that I believe we are going the same direction with same 
goals (with just some nuances).

For me it is important to have manual fallback in Icehouse release. If 
it will be too difficult to implement, we can deliver it in v1 instead 
of v0 (I'll survive :)). Personally I don't think it should be that 
difficult, but I am not the best person to do the best evaluation here. 
But I will strongly fight for this to be in Icehouse release. It 
shouldn't be primary way, but I believe it needs to exist.

Basically what I am saying - be smart, do some 'dry-run' of scheduler to 
see what will be the distribution and if I am happy, confirm. If I am 
not happy, allow me to change it.

-- Jarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131128/9652c2f3/attachment.html>


More information about the OpenStack-dev mailing list