[openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Robert Collins robertc at robertcollins.net
Wed Dec 11 20:42:22 UTC 2013


On 12 December 2013 01:17, Jaromir Coufal <jcoufal at redhat.com> wrote:
> On 2013/10/12 23:09, Robert Collins wrote:

>>> The 'easiest' way is to support bigger companies with huge deployments,
>>> tailored infrastructure, everything connected properly.
>>>
>>> But there are tons of companies/users who are running on old
>>> heterogeneous
>>> hardware. Very likely even more than the number of companies having
>>> already
>>> mentioned large deployments. And giving them only the way of 'setting up
>>> rules' in order to get the service on the node - this type of user is not
>>> gonna use our deployment system.
>>
>>
>> Thats speculation. We don't know if they will or will not because we
>> haven't given them a working system to test.
>
> Some part of that is speculation, some part of that is feedback from people
> who are doing deployments (of course its just very limited audience).
> Anyway, it is not just pure theory.

Sure. Let be me more precise. There is a hypothesis that lack of
direct control will be a significant adoption blocker for a primary
group of users.

I think it's safe to say that some users in the group 'sysadmins
having to deploy an OpenStack cloud' will find it a bridge too far and
not use a system without direct control. Call this group A.

I think it's also safe to say that some users will not care in the
slightest, because their deployment is too small for them to be
particularly worried (e.g. about occasional downtime (but they would
worry a lot about data loss)). Call this group B.

I suspect we don't need to consider group C - folk who won't use a
system if it *has* manual control, but thats only a suspicion. It may
be that the side effect of adding direct control is to reduce
usability below the threshold some folk need...

To assess 'significant adoption blocker' we basically need to find the
% of users who will care sufficiently that they don't use TripleO.

How can we do that? We can do questionnaires, and get such folk to
come talk with use, but that suffers from selection bias - group B can
use the system with or without direct manual control, so have little
motivation to argue vigorously in any particular direction. Group A
however have to argue because they won't use the system at all without
that feature, and they may want to use the system for other reasons,
so that because a crucial aspect for them.

A much better way IMO is to test it - to get a bunch of volunteers and
see who responds positively to a demo *without* direct manual control.

To do that we need a demoable thing, which might just be mockups that
show a set of workflows (and include things like Jay's
shiny-new-hardware use case in the demo).

I rather suspect we're building that anyway as part of doing UX work,
so maybe what we do is put a tweet or blog post up asking for
sysadmins who a) have not yet deployed openstack, b) want to, and c)
are willing to spend 20-30 minutes with us, walk them through a demo
showing no manual control, and record what questions they ask, and
whether they would like to have that product to us, and if not, then
(a) what use cases they can't address with the mockups and (b) what
other reasons they have for not using it.

This is a bunch of work though!

So, do we need to do that work?

*If* we can layer manual control on later, then we could defer this
testing until we are at the point where we can say 'the nova scheduled
version is ready, now lets decide if we add the manual control'.

OTOH, if we *cannot* layer manual control on later - if it has
tentacles through too much of the code base, then we need to decide
earlier, because it will be significantly harder to add later and that
may be too late of a ship date for vendors shipping on top of TripleO.

So with that as a prelude, my technical sense is that we can layer
manual scheduling on later: we provide an advanced screen, show the
list of N instances we're going to ask for and allow each instance to
be directly customised with a node id selected from either the current
node it's running on or an available node. It's significant work both
UI and plumbing, but it's not going to be made harder by the other
work we're doing AFAICT.

-> My proposal is that we shelve this discussion until we have the
nova/heat scheduled version in 'and now we polish' mode, and then pick
it back up and assess user needs.

An alternative argument is to say that group A is a majority of the
userbase and that doing an automatic version is entirely unnecessary.
Thats also possible, but I'm extremely skeptical, given the huge cost
of staff time, and the complete lack of interest my sysadmin friends
(and my former sysadmin self) have in doing automatable things by
hand.

>> Lets break the concern into two halves:
>> A) Users who could have their needs met, but won't use TripleO because
>> meeting their needs in this way is too hard/complex/painful.
>>
>> B) Users who have a need we cannot meet with the current approach.
>>
>> For category B users, their needs might be specific HA things - like
>> the oft discussed failure domains angle, where we need to split up HA
>> clusters across power bars, aircon, switches etc. Clearly long term we
>> want to support them, and the undercloud Nova scheduler is entirely
>> capable of being informed about this, and we can evolve to a holistic
>> statement over time. Lets get a concrete list of the cases we can
>> think of today that won't be well supported initially, and we can
>> figure out where to do the work to support them properly.
>
> My question is - can't we help them now? To enable users to use our app even
> when we don't have enough smartness to help them 'auto' way?

I understand the question: but I can't answer it until we have *an*
example that is both real and not deliverable today. At the moment the
only one we know of is HA, and thats certainly an important feature on
the nova scheduled side, so doing manual control to deliver a future
automatic feature doesn't make a lot of sense to me. Crawl, walk, run.

> This is great point. It's very manual and we can do all hugely better. But
> we can't do anything about that until we have all new shiny features in (and
> it will take time to figure out the best way how to do that properly). Can
> we help them now? Can we scale our potential user base, get them in early,
> get more feedback on their requirements, needs, expectations?

I'm desperate for us to scale our user base.

Right now we're blocked on the nova baremetal-preserve-ephemeral
rebuild blueprint, and then after that heat rolling deploys. *those*
are absolutely critical, regardless of what goes in Tuskar or Tuskar
UI - they are baseline 'the system doesn't work otherwise' aspects,
which will have a profound impact on the ability to sensibly use
TripleO.


> I just want to add one more important point. The whole time we talk about
> satisfying users needs, but the other aspect is their psychology (and
> fulfilling their expectations). We can cover all they need, but they still
> might want to 'feel' the power of control. Note, this is not just my
> prejudice, I asked and discussed that with couple of people - I hope that
> folks will jump in to confirm.

Certainly - I agree psychology is an important part of this, and it's
not one we can answer from first principles. It is however also one we
can't answer by exemplar: we need to know the population occurrence
rates for each archetype we encounter, and that means getting out and
recruiting an unbiased sample somehow.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list