Open Stack

Tue Dec 10 22:09:09 UTC 2013

On 11 December 2013 05:42, Jaromir Coufal <jcoufal at redhat.com> wrote:
> On 2013/09/12 23:38, Tzu-Mainn Chen wrote:
>> The disagreement comes from whether we need manual node assignment or not.
>> I would argue that we
>> need to step back and take a look at the real use case: heterogeneous
>> nodes.  If there are literally
>> no characteristics that differentiate nodes A and B, then why do we care
>> which gets used for what?  Why
>> do we need to manually assign one?
>
>
> Ideally, we don't. But with this approach we would take out the possibility
> to change something or decide something from the user.

So, I think this is where the confusion is. Using the nova scheduler
doesn't prevent change or control. It just ensures the change and
control happen in the right place: the Nova scheduler has had years of
work, of features and facilities being added to support HPC, HA and
other such use cases. It should have everything we need [1], without
going down to manual placement. For clarity: manual placement is when
any of the user, Tuskar, or Heat query Ironic, select a node, and then
use a scheduler hint to bypass the scheduler.

> The 'easiest' way is to support bigger companies with huge deployments,
> tailored infrastructure, everything connected properly.
>
> But there are tons of companies/users who are running on old heterogeneous
> hardware. Very likely even more than the number of companies having already
> mentioned large deployments. And giving them only the way of 'setting up
> rules' in order to get the service on the node - this type of user is not
> gonna use our deployment system.

Thats speculation. We don't know if they will or will not because we
haven't given them a working system to test.

Lets break the concern into two halves:
A) Users who could have their needs met, but won't use TripleO because
meeting their needs in this way is too hard/complex/painful.

B) Users who have a need we cannot meet with the current approach.

For category B users, their needs might be specific HA things - like
the oft discussed failure domains angle, where we need to split up HA
clusters across power bars, aircon, switches etc. Clearly long term we
want to support them, and the undercloud Nova scheduler is entirely
capable of being informed about this, and we can evolve to a holistic
statement over time. Lets get a concrete list of the cases we can
think of today that won't be well supported initially, and we can
figure out where to do the work to support them properly.

For category A users, I think that we should get concrete examples,
and evolve our design (architecture and UX) to make meeting those
needs pleasant.

What we shouldn't do is plan complex work without concrete examples
that people actually need. Jay's example of some shiny new compute
servers with special parts that need to be carved out was a great one
- we can put that in category A, and figure out if it's easy enough,
or obvious enough - and think about whether we document it or make it
a guided workflow or $whatever.

> Somebody might argue - why do we care? If user doesn't like TripleO
> paradigm, he shouldn't use the UI and should use another tool. But the UI is
> not only about TripleO. Yes, it is underlying concept, but we are working on
> future *official* OpenStack deployment tool. We should care to enable people
> to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
> typical or a bit more specific use-cases.

The difficulty I'm having is that the discussion seems to assume that
'heterogeneous implies manual', but I don't agree that that
implication is necessary!

> As an underlying paradigm of how to install cloud - awesome idea, awesome
> concept, it works. But user doesn't care about how it is being deployed for
> him. He cares about getting what he wants/needs. And we shouldn't go that
> far that we violently force him to treat his infrastructure as cloud. I
> believe that possibility to change/control - if needed - is very important
> and we should care.

I propose that we make concrete use cases: 'Fred cannot use TripleO
without manual assignment because XYZ'. Then we can assess how
important XYZ is to our early adopters and go from there.

> And what is key for us is to *enable* users - not to prevent them from using
> our deployment tool, because it doesn't work for their requirements.

Totally agreed :)

>> If we can agree on that, then I think it would be sufficient to say that
>> we want a mechanism to allow
>> UI users to deal with heterogeneous nodes, and that mechanism must use
>> nova-scheduler.  In my mind,
>> that's what resource classes and node profiles are intended for.
>
>
> Not arguing on this point. Though that mechanism should support also cases,
> where user specifies a role for a node / removes node from a role. The rest
> of nodes which I don't care about should be handled by nova-scheduler.

Why! What is a use case for removing a role from a node while leaving
that node in service? Lets be specific, always, when we're using
categories of use case to argue for a specific feature/design point.

> Give it to Nova guys to fix it... What if that user's need would be
> undercloud specific requirement?  Why should Nova guys care? What should our
> unhappy user do until then? Use other tool? Will he be willing to get back
> to use our tool once it is ready?

The undercloud is a Nova Baremetal compute cloud. This is a hugely
attractive deployment of Nova for HPC use cases, and thats why Nova
merged the baremetal code in the first place, and why they still care
today. What should our user do when any part of OpenStack has a bug
preventing their use of it? Same answer here :)

> I can also see other use-cases. It can be distribution based on power
> sockets, networking connections, etc. We can't think about all the ways
> which our user will need.

But we can think about a language for them to describe those things.
Which is what host aggregates offer today, though its very manual, and
I'd love to see us do something massively better.

>> unallocated | aqvailable | undeployed
>
> +1 unallocated

I think available is most accurate, but undeployed works too. I really
don't like unallocated, sorry!

>> ceate a node distribution | size the deployment
>
> * Distribute nodes
>
>> resource classes | ?
>
> Service classes?

Brainstorming: role is something like 'KVM compute', but we may have
two differing only in configuration sets of that role. In a very
technical sense it's actually:
image + configuration -> scaling group in Heat.
So perhaps:
Role + Service group ?
e.g. GPU KVM Hypervisor would be a service group, using the KVM
Compute role aka disk image.

Or perhaps we should actually surface image all the way up:

Image + Service group ?
image = what things we build into the image
service group = what runtime configuration we're giving, including how
many machines we want in the group

>>> I just want to get away from talking about something
>>> ([manual] allocation) that we don't offer.
>
> We don't at the moment but we should :)

maybe :0

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

Open Stack

[openstack-dev] [TripleO][Tuskar] Icehouse Requirements

OpenStack

Community

Documentation

Branding & Legal