[openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Tzu-Mainn Chen tzumainn at redhat.com
Mon Dec 9 22:38:25 UTC 2013


Thanks for the explanation!

I'm going to claim that the thread revolves around two main areas of disagreement.  Then I'm going
to propose a way through:

a) Manual Node Assignment

I think that everyone is agreed that automated node assignment through nova-scheduler is by
far the most ideal case; there's no disagreement there.

The disagreement comes from whether we need manual node assignment or not.  I would argue that we
need to step back and take a look at the real use case: heterogeneous nodes.  If there are literally
no characteristics that differentiate nodes A and B, then why do we care which gets used for what?  Why
do we need to manually assign one?

If we can agree on that, then I think it would be sufficient to say that we want a mechanism to allow
UI users to deal with heterogeneous nodes, and that mechanism must use nova-scheduler.  In my mind,
that's what resource classes and node profiles are intended for.

One possible objection might be: nova scheduler doesn't have the appropriate filter that we need to
separate out two nodes.  In that case, I would say that needs to be taken up with nova developers.


b) Terminology

It feels a bit like some of the disagreement come from people using different words for the same thing.
For example, the wireframes already details a UI where Robert's roles come first, but I think that message
was confused because I mentioned "node types" in the requirements.

So could we come to some agreement on what the most exact terminology would be?  I've listed some examples below,
but I'm sure there are more.

node type | role
management node | ?
resource node | ?
unallocated | available | undeployed
create a node distribution | size the deployment
resource classes | ?
node profiles | ?

Mainn

----- Original Message -----
> On 10 December 2013 09:55, Tzu-Mainn Chen <tzumainn at redhat.com> wrote:
> >> >        * created as part of undercloud install process
> 
> >> By that note I meant, that Nodes are not resources, Resource instances
> >> run on Nodes. Nodes are the generic pool of hardware we can deploy
> >> things onto.
> >
> > I don't think "resource nodes" is intended to imply that nodes are
> > resources; rather, it's supposed to
> > indicate that it's a node where a resource instance runs.  It's supposed to
> > separate it from "management node"
> > and "unallocated node".
> 
> So the question is are we looking at /nodes/ that have a /current
> role/, or are we looking at /roles/ that have some /current nodes/.
> 
> My contention is that the role is the interesting thing, and the nodes
> is the incidental thing. That is, as a sysadmin, my hierarchy of
> concerns is something like:
>  A: are all services running
>  B: are any of them in a degraded state where I need to take prompt
> action to prevent a service outage [might mean many things: - software
> update/disk space criticals/a machine failed and we need to scale the
> cluster back up/too much load]
>  C: are there any planned changes I need to make [new software deploy,
> feature request from user, replacing a faulty machine]
>  D: are there long term issues sneaking up on me [capacity planning,
> machine obsolescence]
> 
> If we take /nodes/ as the interesting thing, and what they are doing
> right now as the incidental thing, it's much harder to map that onto
> the sysadmin concerns. If we start with /roles/ then can answer:
>  A: by showing the list of roles and the summary stats (how many
> machines, service status aggregate), role level alerts (e.g. nova-api
> is not responding)
>  B: by showing the list of roles and more detailed stats (overall
> load, response times of services, tickets against services
>      and a list of in trouble instances in each role - instances with
> alerts against them - low disk, overload, failed service,
> early-detection alerts from hardware
>  C: probably out of our remit for now in the general case, but we need
> to enable some things here like replacing faulty machines
>  D: by looking at trend graphs for roles (not machines), but also by
> looking at the hardware in aggregate - breakdown by age of machines,
> summary data for tickets filed against instances that were deployed to
> a particular machine
> 
> C: and D: are (F) category work, but for all but the very last thing,
> it seems clear how to approach this from a roles perspective.
> 
> I've tried to approach this using /nodes/ as the starting point, and
> after two terrible drafts I've deleted the section. I'd love it if
> someone could show me how it would work:)
> 
> >> >     * Unallocated nodes
> >> >
> >> > This implies an 'allocation' step, that we don't have - how about
> >> > 'Idle nodes' or something.
> >> >
> >> > It can be auto-allocation. I don't see problem with 'unallocated' term.
> >>
> >> Ok, it's not a biggy. I do think it will frame things poorly and lead
> >> to an expectation about how TripleO works that doesn't match how it
> >> does, but we can change it later if I'm right, and if I'm wrong, well
> >> it won't be the first time :).
> >>
> >
> > I'm interested in what the distinction you're making here is.  I'd rather
> > get things
> > defined correctly the first time, and it's very possible that I'm missing a
> > fundamental
> > definition here.
> 
> So we have:
>  - node - a physical general purpose machine capable of running in
> many roles. Some nodes may have hardware layout that is particularly
> useful for a given role.
>  - role - a specific workload we want to map onto one or more nodes.
> Examples include 'undercloud control plane', 'overcloud control
> plane', 'overcloud storage', 'overcloud compute' etc.
>  - instance - A role deployed on a node - this is where work actually
>  happens.
>  - scheduling - the process of deciding which role is deployed on which node.
> 
> The way TripleO works is that we defined a Heat template that lays out
> policy: 5 instances of 'overcloud control plane please', '20
> hypervisors' etc. Heat passes that to Nova, which pulls the image for
> the role out of Glance, picks a node, and deploys the image to the
> node.
> 
> Note in particular the order: Heat -> Nova -> Scheduler -> Node chosen.
> 
> The user action is not 'allocate a Node to 'overcloud control plane',
> it is 'size the control plane through heat'.
> 
> So when we talk about 'unallocated Nodes', the implication is that
> users 'allocate Nodes', but they don't: they size roles, and after
> doing all that there may be some Nodes that are - yes - unallocated,
> or have nothing scheduled to them. So... I'm not debating that we
> should have a list of free hardware - we totally should - I'm debating
> how we frame it. 'Available Nodes' or 'Undeployed machines' or
> whatever. I just want to get away from talking about something
> ([manual] allocation) that we don't offer.
> 
> -Rob
> 
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list