[openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Jaromir Coufal jcoufal at redhat.com
Wed Dec 11 12:17:57 UTC 2013


On 2013/10/12 23:09, Robert Collins wrote:
> On 11 December 2013 05:42, Jaromir Coufal <jcoufal at redhat.com> wrote:
>> On 2013/09/12 23:38, Tzu-Mainn Chen wrote:
>>> The disagreement comes from whether we need manual node assignment or not.
>>> I would argue that we
>>> need to step back and take a look at the real use case: heterogeneous
>>> nodes.  If there are literally
>>> no characteristics that differentiate nodes A and B, then why do we care
>>> which gets used for what?  Why
>>> do we need to manually assign one?
>>
>>
>> Ideally, we don't. But with this approach we would take out the possibility
>> to change something or decide something from the user.
>
> So, I think this is where the confusion is. Using the nova scheduler
> doesn't prevent change or control. It just ensures the change and
> control happen in the right place: the Nova scheduler has had years of
> work, of features and facilities being added to support HPC, HA and
> other such use cases. It should have everything we need [1], without
> going down to manual placement. For clarity: manual placement is when
> any of the user, Tuskar, or Heat query Ironic, select a node, and then
> use a scheduler hint to bypass the scheduler.
This is very well written. I am all for things going to right places.

>> The 'easiest' way is to support bigger companies with huge deployments,
>> tailored infrastructure, everything connected properly.
>>
>> But there are tons of companies/users who are running on old heterogeneous
>> hardware. Very likely even more than the number of companies having already
>> mentioned large deployments. And giving them only the way of 'setting up
>> rules' in order to get the service on the node - this type of user is not
>> gonna use our deployment system.
>
> Thats speculation. We don't know if they will or will not because we
> haven't given them a working system to test.
Some part of that is speculation, some part of that is feedback from 
people who are doing deployments (of course its just very limited 
audience). Anyway, it is not just pure theory.

> Lets break the concern into two halves:
> A) Users who could have their needs met, but won't use TripleO because
> meeting their needs in this way is too hard/complex/painful.
>
> B) Users who have a need we cannot meet with the current approach.
>
> For category B users, their needs might be specific HA things - like
> the oft discussed failure domains angle, where we need to split up HA
> clusters across power bars, aircon, switches etc. Clearly long term we
> want to support them, and the undercloud Nova scheduler is entirely
> capable of being informed about this, and we can evolve to a holistic
> statement over time. Lets get a concrete list of the cases we can
> think of today that won't be well supported initially, and we can
> figure out where to do the work to support them properly.
My question is - can't we help them now? To enable users to use our app 
even when we don't have enough smartness to help them 'auto' way?

> For category A users, I think that we should get concrete examples,
> and evolve our design (architecture and UX) to make meeting those
> needs pleasant.
+1... I tried to pull some operators into this discussion thread, will 
try to get more.

> What we shouldn't do is plan complex work without concrete examples
> that people actually need. Jay's example of some shiny new compute
> servers with special parts that need to be carved out was a great one
> - we can put that in category A, and figure out if it's easy enough,
> or obvious enough - and think about whether we document it or make it
> a guided workflow or $whatever.
>
>> Somebody might argue - why do we care? If user doesn't like TripleO
>> paradigm, he shouldn't use the UI and should use another tool. But the UI is
>> not only about TripleO. Yes, it is underlying concept, but we are working on
>> future *official* OpenStack deployment tool. We should care to enable people
>> to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
>> typical or a bit more specific use-cases.
>
> The difficulty I'm having is that the discussion seems to assume that
> 'heterogeneous implies manual', but I don't agree that that
> implication is necessary!
No, I don't agree with this either. Heterogeneous hardware can be very 
well managed automatically as well as homogeneous (classes, node profiles).

>> As an underlying paradigm of how to install cloud - awesome idea, awesome
>> concept, it works. But user doesn't care about how it is being deployed for
>> him. He cares about getting what he wants/needs. And we shouldn't go that
>> far that we violently force him to treat his infrastructure as cloud. I
>> believe that possibility to change/control - if needed - is very important
>> and we should care.
>
> I propose that we make concrete use cases: 'Fred cannot use TripleO
> without manual assignment because XYZ'. Then we can assess how
> important XYZ is to our early adopters and go from there.
+1, yes. I will try to bug more relevant people, who could contribute at 
this area.

>> And what is key for us is to *enable* users - not to prevent them from using
>> our deployment tool, because it doesn't work for their requirements.
>
> Totally agreed :)
>
>>> If we can agree on that, then I think it would be sufficient to say that
>>> we want a mechanism to allow
>>> UI users to deal with heterogeneous nodes, and that mechanism must use
>>> nova-scheduler.  In my mind,
>>> that's what resource classes and node profiles are intended for.
>>
>>
>> Not arguing on this point. Though that mechanism should support also cases,
>> where user specifies a role for a node / removes node from a role. The rest
>> of nodes which I don't care about should be handled by nova-scheduler.
>
> Why! What is a use case for removing a role from a node while leaving
> that node in service? Lets be specific, always, when we're using
> categories of use case to argue for a specific feature/design point.
As mentioned above, I will try to include more relevant people to this 
discussion.

>> Give it to Nova guys to fix it... What if that user's need would be
>> undercloud specific requirement?  Why should Nova guys care? What should our
>> unhappy user do until then? Use other tool? Will he be willing to get back
>> to use our tool once it is ready?
>
> The undercloud is a Nova Baremetal compute cloud. This is a hugely
> attractive deployment of Nova for HPC use cases, and thats why Nova
> merged the baremetal code in the first place, and why they still care
> today. What should our user do when any part of OpenStack has a bug
> preventing their use of it? Same answer here :)
>
>> I can also see other use-cases. It can be distribution based on power
>> sockets, networking connections, etc. We can't think about all the ways
>> which our user will need.
>
> But we can think about a language for them to describe those things.
> Which is what host aggregates offer today, though its very manual, and
> I'd love to see us do something massively better.
This is great point. It's very manual and we can do all hugely better. 
But we can't do anything about that until we have all new shiny features 
in (and it will take time to figure out the best way how to do that 
properly). Can we help them now? Can we scale our potential user base, 
get them in early, get more feedback on their requirements, needs, 
expectations?

>>> unallocated | aqvailable | undeployed
>>
>> +1 unallocated
>
> I think available is most accurate, but undeployed works too. I really
> don't like unallocated, sorry!
>
>>> ceate a node distribution | size the deployment
>>
>> * Distribute nodes
>>
>>> resource classes | ?
>>
>> Service classes?
>
> Brainstorming: role is something like 'KVM compute', but we may have
> two differing only in configuration sets of that role. In a very
> technical sense it's actually:
> image + configuration -> scaling group in Heat.
> So perhaps:
> Role + Service group ?
> e.g. GPU KVM Hypervisor would be a service group, using the KVM
> Compute role aka disk image.
>
> Or perhaps we should actually surface image all the way up:
>
> Image + Service group ?
> image = what things we build into the image
> service group = what runtime configuration we're giving, including how
> many machines we want in the group
+1 to this

>>>> I just want to get away from talking about something
>>>> ([manual] allocation) that we don't offer.
>>
>> We don't at the moment but we should :)
>
> maybe :0
:)

I just want to add one more important point. The whole time we talk 
about satisfying users needs, but the other aspect is their psychology 
(and fulfilling their expectations). We can cover all they need, but 
they still might want to 'feel' the power of control. Note, this is not 
just my prejudice, I asked and discussed that with couple of people - I 
hope that folks will jump in to confirm.

-- Jarda



More information about the OpenStack-dev mailing list