[openstack-dev] [TripleO][Tuskar] Icehouse Requirements

Keith Basil kbasil at redhat.com
Thu Dec 12 16:35:06 UTC 2013


On Dec 10, 2013, at 5:09 PM, Robert Collins wrote:

> On 11 December 2013 05:42, Jaromir Coufal <jcoufal at redhat.com> wrote:
>> On 2013/09/12 23:38, Tzu-Mainn Chen wrote:
>>> The disagreement comes from whether we need manual node assignment or not.
>>> I would argue that we
>>> need to step back and take a look at the real use case: heterogeneous
>>> nodes.  If there are literally
>>> no characteristics that differentiate nodes A and B, then why do we care
>>> which gets used for what?  Why
>>> do we need to manually assign one?
>> 
>> 
>> Ideally, we don't. But with this approach we would take out the possibility
>> to change something or decide something from the user.
> 
> So, I think this is where the confusion is. Using the nova scheduler
> doesn't prevent change or control. It just ensures the change and
> control happen in the right place: the Nova scheduler has had years of
> work, of features and facilities being added to support HPC, HA and
> other such use cases. It should have everything we need [1], without
> going down to manual placement. For clarity: manual placement is when
> any of the user, Tuskar, or Heat query Ironic, select a node, and then
> use a scheduler hint to bypass the scheduler.
> 
>> The 'easiest' way is to support bigger companies with huge deployments,
>> tailored infrastructure, everything connected properly.
>> 
>> But there are tons of companies/users who are running on old heterogeneous
>> hardware. Very likely even more than the number of companies having already
>> mentioned large deployments. And giving them only the way of 'setting up
>> rules' in order to get the service on the node - this type of user is not
>> gonna use our deployment system.
> 
> Thats speculation. We don't know if they will or will not because we
> haven't given them a working system to test.
> 
> Lets break the concern into two halves:
> A) Users who could have their needs met, but won't use TripleO because
> meeting their needs in this way is too hard/complex/painful.
> 
> B) Users who have a need we cannot meet with the current approach.
> 
> For category B users, their needs might be specific HA things - like
> the oft discussed failure domains angle, where we need to split up HA
> clusters across power bars, aircon, switches etc. Clearly long term we
> want to support them, and the undercloud Nova scheduler is entirely
> capable of being informed about this, and we can evolve to a holistic
> statement over time. Lets get a concrete list of the cases we can
> think of today that won't be well supported initially, and we can
> figure out where to do the work to support them properly.
> 
> For category A users, I think that we should get concrete examples,
> and evolve our design (architecture and UX) to make meeting those
> needs pleasant.
> 
> What we shouldn't do is plan complex work without concrete examples
> that people actually need. Jay's example of some shiny new compute
> servers with special parts that need to be carved out was a great one
> - we can put that in category A, and figure out if it's easy enough,
> or obvious enough - and think about whether we document it or make it
> a guided workflow or $whatever.
> 
>> Somebody might argue - why do we care? If user doesn't like TripleO
>> paradigm, he shouldn't use the UI and should use another tool. But the UI is
>> not only about TripleO. Yes, it is underlying concept, but we are working on
>> future *official* OpenStack deployment tool. We should care to enable people
>> to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
>> typical or a bit more specific use-cases.
> 
> The difficulty I'm having is that the discussion seems to assume that
> 'heterogeneous implies manual', but I don't agree that that
> implication is necessary!
> 
>> As an underlying paradigm of how to install cloud - awesome idea, awesome
>> concept, it works. But user doesn't care about how it is being deployed for
>> him. He cares about getting what he wants/needs. And we shouldn't go that
>> far that we violently force him to treat his infrastructure as cloud. I
>> believe that possibility to change/control - if needed - is very important
>> and we should care.
> 
> I propose that we make concrete use cases: 'Fred cannot use TripleO
> without manual assignment because XYZ'. Then we can assess how
> important XYZ is to our early adopters and go from there.
> 
>> And what is key for us is to *enable* users - not to prevent them from using
>> our deployment tool, because it doesn't work for their requirements.
> 
> Totally agreed :)
> 
>>> If we can agree on that, then I think it would be sufficient to say that
>>> we want a mechanism to allow
>>> UI users to deal with heterogeneous nodes, and that mechanism must use
>>> nova-scheduler.  In my mind,
>>> that's what resource classes and node profiles are intended for.
>> 
>> 
>> Not arguing on this point. Though that mechanism should support also cases,
>> where user specifies a role for a node / removes node from a role. The rest
>> of nodes which I don't care about should be handled by nova-scheduler.
> 
> Why! What is a use case for removing a role from a node while leaving
> that node in service? Lets be specific, always, when we're using
> categories of use case to argue for a specific feature/design point.
> 
> 
>> Give it to Nova guys to fix it... What if that user's need would be
>> undercloud specific requirement?  Why should Nova guys care? What should our
>> unhappy user do until then? Use other tool? Will he be willing to get back
>> to use our tool once it is ready?
> 
> The undercloud is a Nova Baremetal compute cloud. This is a hugely
> attractive deployment of Nova for HPC use cases, and thats why Nova
> merged the baremetal code in the first place, and why they still care
> today. What should our user do when any part of OpenStack has a bug
> preventing their use of it? Same answer here :)
> 
>> I can also see other use-cases. It can be distribution based on power
>> sockets, networking connections, etc. We can't think about all the ways
>> which our user will need.
> 
> But we can think about a language for them to describe those things.
> Which is what host aggregates offer today, though its very manual, and
> I'd love to see us do something massively better.
> 
> 
>>> unallocated | aqvailable | undeployed
>> 
>> +1 unallocated
> 
> I think available is most accurate, but undeployed works too. I really
> don't like unallocated, sorry!

	Would "available" introduce/denote that the service is deployed
	and operational?
> 
>>> ceate a node distribution | size the deployment
>> 
>> * Distribute nodes
>> 
>>> resource classes | ?
>> 
>> Service classes?
> 
> Brainstorming: role is something like 'KVM compute', but we may have
> two differing only in configuration sets of that role. In a very
> technical sense it's actually:
> image + configuration -> scaling group in Heat.
> So perhaps:
> Role + Service group ?
> e.g. GPU KVM Hypervisor would be a service group, using the KVM
> Compute role aka disk image.
> 
> Or perhaps we should actually surface image all the way up:
> 
> Image + Service group ?
> image = what things we build into the image
> service group = what runtime configuration we're giving, including how
> many machines we want in the group
> 
	How about just leaving it as Resource Class?  The things you've
	brainstormed about are in line with the original thinking around
	the resource class concept.

	role (assumes role specific image) + 
	service/resource grouping +
	hardware that can provide that service/resource 

	-k





More information about the OpenStack-dev mailing list