[openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Will Foster wfoster at redhat.com
Thu Dec 12 17:24:24 UTC 2013


On 12/12/13 09:42 +1300, Robert Collins wrote:
>On 12 December 2013 01:17, Jaromir Coufal <jcoufal at redhat.com> wrote:
>> On 2013/10/12 23:09, Robert Collins wrote:
>
>>>> The 'easiest' way is to support bigger companies with huge deployments,
>>>> tailored infrastructure, everything connected properly.
>>>>
>>>> But there are tons of companies/users who are running on old
>>>> heterogeneous
>>>> hardware. Very likely even more than the number of companies having
>>>> already
>>>> mentioned large deployments. And giving them only the way of 'setting up
>>>> rules' in order to get the service on the node - this type of user is not
>>>> gonna use our deployment system.
>>>
>>>
>>> Thats speculation. We don't know if they will or will not because we
>>> haven't given them a working system to test.
>>
>> Some part of that is speculation, some part of that is feedback from people
>> who are doing deployments (of course its just very limited audience).
>> Anyway, it is not just pure theory.
>
>Sure. Let be me more precise. There is a hypothesis that lack of
>direct control will be a significant adoption blocker for a primary
>group of users.
>
>I think it's safe to say that some users in the group 'sysadmins
>having to deploy an OpenStack cloud' will find it a bridge too far and
>not use a system without direct control. Call this group A.
>
>I think it's also safe to say that some users will not care in the
>slightest, because their deployment is too small for them to be
>particularly worried (e.g. about occasional downtime (but they would
>worry a lot about data loss)). Call this group B.
>
>I suspect we don't need to consider group C - folk who won't use a
>system if it *has* manual control, but thats only a suspicion. It may
>be that the side effect of adding direct control is to reduce
>usability below the threshold some folk need...
>
>To assess 'significant adoption blocker' we basically need to find the
>% of users who will care sufficiently that they don't use TripleO.
>
>How can we do that? We can do questionnaires, and get such folk to
>come talk with use, but that suffers from selection bias - group B can
>use the system with or without direct manual control, so have little
>motivation to argue vigorously in any particular direction. Group A
>however have to argue because they won't use the system at all without
>that feature, and they may want to use the system for other reasons,
>so that because a crucial aspect for them.
>
>A much better way IMO is to test it - to get a bunch of volunteers and
>see who responds positively to a demo *without* direct manual control.
>
>To do that we need a demoable thing, which might just be mockups that
>show a set of workflows (and include things like Jay's
>shiny-new-hardware use case in the demo).
>
>I rather suspect we're building that anyway as part of doing UX work,
>so maybe what we do is put a tweet or blog post up asking for
>sysadmins who a) have not yet deployed openstack, b) want to, and c)
>are willing to spend 20-30 minutes with us, walk them through a demo
>showing no manual control, and record what questions they ask, and
>whether they would like to have that product to us, and if not, then
>(a) what use cases they can't address with the mockups and (b) what
>other reasons they have for not using it.
>
>This is a bunch of work though!
>
>So, do we need to do that work?
>
>*If* we can layer manual control on later, then we could defer this
>testing until we are at the point where we can say 'the nova scheduled
>version is ready, now lets decide if we add the manual control'.
>
>OTOH, if we *cannot* layer manual control on later - if it has
>tentacles through too much of the code base, then we need to decide
>earlier, because it will be significantly harder to add later and that
>may be too late of a ship date for vendors shipping on top of TripleO.
>
>So with that as a prelude, my technical sense is that we can layer
>manual scheduling on later: we provide an advanced screen, show the
>list of N instances we're going to ask for and allow each instance to
>be directly customised with a node id selected from either the current
>node it's running on or an available node. It's significant work both
>UI and plumbing, but it's not going to be made harder by the other
>work we're doing AFAICT.
>
>-> My proposal is that we shelve this discussion until we have the
>nova/heat scheduled version in 'and now we polish' mode, and then pick
>it back up and assess user needs.
>
>An alternative argument is to say that group A is a majority of the
>userbase and that doing an automatic version is entirely unnecessary.
>Thats also possible, but I'm extremely skeptical, given the huge cost
>of staff time, and the complete lack of interest my sysadmin friends
>(and my former sysadmin self) have in doing automatable things by
>hand.

I just wanted to add a few thoughts:

For some comparative information here "from the field" I work
extensively on deployments of large OpenStack implementations,
most recently with a ~220node/9rack deployment (scaling up to 
42racks / 1024 nodes soon).  My primary role is of a Devops/Sysadmin 
nature, and not a specific development area so rapid provisioning/tooling/automation 
is an area I almost exclusively work within (mostly using API-driven
using Foreman/Puppet).  The infrastructure our small team designs/builds 
supports our development and business.

I am the target user base you'd probably want to cater to.

I can tell you the philosophy and mechanics of Tuskar/OOO are great, 
something I'd love to start using extensively but there are some needed 
aspects in the areas of control that I feel should be added (though arguably
less for me and more for my ilk who are looking to expand their OpenStack footprint).

* ability to 'preview' changes going to the scheduler
* ability to override/change some aspects within node assignment
* ability to view at least minimal logging from within Tuskar UI

Here's the main reason - most new adopters of OpenStack/IaaS are going to be
running legacy/mixed hardware and while they might have an initiative to
explore and invest and even a decent budget most of them are not going to have
completely identical hardware, isolated/flat networks and things set
aside in such a way that blind auto-discovery/deployment will just work all
the time.

There will be a need to sometimes adjust, and those coming from a more
vertically-scaling infrastructure (most large orgs.) will not have
100% matching standards in place of vendor, machine spec and network design 
which may make Tuscar/OOO seem inflexible and 'one-way'.  This may just be a
carry-over or fear of the old ways of deployment but nonetheless it
is present.

In my case, we're lucky enough to have dedicated, near-identical
equipment and a flexible network design we've architected prior that
makes Tuskar/OOO a great fit.  Most people will not have this
greenfield ability and will use what they have lying around initially
as to not make a big investment until familiarity and trust of
something new is permeated.

That said, I've been working with Jaromir Coufal on some UI mockups of
Tuskar with some of this 'advanced' functionality included and from
my perspective it looks like something to consider pulling in sooner than
later if you want to maximize the adoption of new users.

Thanks,

-will


>
>>> Lets break the concern into two halves:
>>> A) Users who could have their needs met, but won't use TripleO because
>>> meeting their needs in this way is too hard/complex/painful.
>>>
>>> B) Users who have a need we cannot meet with the current approach.
>>>
>>> For category B users, their needs might be specific HA things - like
>>> the oft discussed failure domains angle, where we need to split up HA
>>> clusters across power bars, aircon, switches etc. Clearly long term we
>>> want to support them, and the undercloud Nova scheduler is entirely
>>> capable of being informed about this, and we can evolve to a holistic
>>> statement over time. Lets get a concrete list of the cases we can
>>> think of today that won't be well supported initially, and we can
>>> figure out where to do the work to support them properly.
>>
>> My question is - can't we help them now? To enable users to use our app even
>> when we don't have enough smartness to help them 'auto' way?
>
>I understand the question: but I can't answer it until we have *an*
>example that is both real and not deliverable today. At the moment the
>only one we know of is HA, and thats certainly an important feature on
>the nova scheduled side, so doing manual control to deliver a future
>automatic feature doesn't make a lot of sense to me. Crawl, walk, run.
>
>> This is great point. It's very manual and we can do all hugely better. But
>> we can't do anything about that until we have all new shiny features in (and
>> it will take time to figure out the best way how to do that properly). Can
>> we help them now? Can we scale our potential user base, get them in early,
>> get more feedback on their requirements, needs, expectations?
>
>I'm desperate for us to scale our user base.
>
>Right now we're blocked on the nova baremetal-preserve-ephemeral
>rebuild blueprint, and then after that heat rolling deploys. *those*
>are absolutely critical, regardless of what goes in Tuskar or Tuskar
>UI - they are baseline 'the system doesn't work otherwise' aspects,
>which will have a profound impact on the ability to sensibly use
>TripleO.
>
>
>> I just want to add one more important point. The whole time we talk about
>> satisfying users needs, but the other aspect is their psychology (and
>> fulfilling their expectations). We can cover all they need, but they still
>> might want to 'feel' the power of control. Note, this is not just my
>> prejudice, I asked and discussed that with couple of people - I hope that
>> folks will jump in to confirm.
>
>Certainly - I agree psychology is an important part of this, and it's
>not one we can answer from first principles. It is however also one we
>can't answer by exemplar: we need to know the population occurrence
>rates for each archetype we encounter, and that means getting out and
>recruiting an unbiased sample somehow.
>
>-Rob
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131212/39836379/attachment.pgp>


More information about the OpenStack-dev mailing list