[openstack-dev] [TripleO][Tuskar] Icehouse Requirements
kbasil at redhat.com
Thu Dec 12 16:35:06 UTC 2013
On Dec 10, 2013, at 5:09 PM, Robert Collins wrote:
> On 11 December 2013 05:42, Jaromir Coufal <jcoufal at redhat.com> wrote:
>> On 2013/09/12 23:38, Tzu-Mainn Chen wrote:
>>> The disagreement comes from whether we need manual node assignment or not.
>>> I would argue that we
>>> need to step back and take a look at the real use case: heterogeneous
>>> nodes. If there are literally
>>> no characteristics that differentiate nodes A and B, then why do we care
>>> which gets used for what? Why
>>> do we need to manually assign one?
>> Ideally, we don't. But with this approach we would take out the possibility
>> to change something or decide something from the user.
> So, I think this is where the confusion is. Using the nova scheduler
> doesn't prevent change or control. It just ensures the change and
> control happen in the right place: the Nova scheduler has had years of
> work, of features and facilities being added to support HPC, HA and
> other such use cases. It should have everything we need , without
> going down to manual placement. For clarity: manual placement is when
> any of the user, Tuskar, or Heat query Ironic, select a node, and then
> use a scheduler hint to bypass the scheduler.
>> The 'easiest' way is to support bigger companies with huge deployments,
>> tailored infrastructure, everything connected properly.
>> But there are tons of companies/users who are running on old heterogeneous
>> hardware. Very likely even more than the number of companies having already
>> mentioned large deployments. And giving them only the way of 'setting up
>> rules' in order to get the service on the node - this type of user is not
>> gonna use our deployment system.
> Thats speculation. We don't know if they will or will not because we
> haven't given them a working system to test.
> Lets break the concern into two halves:
> A) Users who could have their needs met, but won't use TripleO because
> meeting their needs in this way is too hard/complex/painful.
> B) Users who have a need we cannot meet with the current approach.
> For category B users, their needs might be specific HA things - like
> the oft discussed failure domains angle, where we need to split up HA
> clusters across power bars, aircon, switches etc. Clearly long term we
> want to support them, and the undercloud Nova scheduler is entirely
> capable of being informed about this, and we can evolve to a holistic
> statement over time. Lets get a concrete list of the cases we can
> think of today that won't be well supported initially, and we can
> figure out where to do the work to support them properly.
> For category A users, I think that we should get concrete examples,
> and evolve our design (architecture and UX) to make meeting those
> needs pleasant.
> What we shouldn't do is plan complex work without concrete examples
> that people actually need. Jay's example of some shiny new compute
> servers with special parts that need to be carved out was a great one
> - we can put that in category A, and figure out if it's easy enough,
> or obvious enough - and think about whether we document it or make it
> a guided workflow or $whatever.
>> Somebody might argue - why do we care? If user doesn't like TripleO
>> paradigm, he shouldn't use the UI and should use another tool. But the UI is
>> not only about TripleO. Yes, it is underlying concept, but we are working on
>> future *official* OpenStack deployment tool. We should care to enable people
>> to deploy OpenStack - large/small scale, homo/heterogeneous hardware,
>> typical or a bit more specific use-cases.
> The difficulty I'm having is that the discussion seems to assume that
> 'heterogeneous implies manual', but I don't agree that that
> implication is necessary!
>> As an underlying paradigm of how to install cloud - awesome idea, awesome
>> concept, it works. But user doesn't care about how it is being deployed for
>> him. He cares about getting what he wants/needs. And we shouldn't go that
>> far that we violently force him to treat his infrastructure as cloud. I
>> believe that possibility to change/control - if needed - is very important
>> and we should care.
> I propose that we make concrete use cases: 'Fred cannot use TripleO
> without manual assignment because XYZ'. Then we can assess how
> important XYZ is to our early adopters and go from there.
>> And what is key for us is to *enable* users - not to prevent them from using
>> our deployment tool, because it doesn't work for their requirements.
> Totally agreed :)
>>> If we can agree on that, then I think it would be sufficient to say that
>>> we want a mechanism to allow
>>> UI users to deal with heterogeneous nodes, and that mechanism must use
>>> nova-scheduler. In my mind,
>>> that's what resource classes and node profiles are intended for.
>> Not arguing on this point. Though that mechanism should support also cases,
>> where user specifies a role for a node / removes node from a role. The rest
>> of nodes which I don't care about should be handled by nova-scheduler.
> Why! What is a use case for removing a role from a node while leaving
> that node in service? Lets be specific, always, when we're using
> categories of use case to argue for a specific feature/design point.
>> Give it to Nova guys to fix it... What if that user's need would be
>> undercloud specific requirement? Why should Nova guys care? What should our
>> unhappy user do until then? Use other tool? Will he be willing to get back
>> to use our tool once it is ready?
> The undercloud is a Nova Baremetal compute cloud. This is a hugely
> attractive deployment of Nova for HPC use cases, and thats why Nova
> merged the baremetal code in the first place, and why they still care
> today. What should our user do when any part of OpenStack has a bug
> preventing their use of it? Same answer here :)
>> I can also see other use-cases. It can be distribution based on power
>> sockets, networking connections, etc. We can't think about all the ways
>> which our user will need.
> But we can think about a language for them to describe those things.
> Which is what host aggregates offer today, though its very manual, and
> I'd love to see us do something massively better.
>>> unallocated | aqvailable | undeployed
>> +1 unallocated
> I think available is most accurate, but undeployed works too. I really
> don't like unallocated, sorry!
Would "available" introduce/denote that the service is deployed
>>> ceate a node distribution | size the deployment
>> * Distribute nodes
>>> resource classes | ?
>> Service classes?
> Brainstorming: role is something like 'KVM compute', but we may have
> two differing only in configuration sets of that role. In a very
> technical sense it's actually:
> image + configuration -> scaling group in Heat.
> So perhaps:
> Role + Service group ?
> e.g. GPU KVM Hypervisor would be a service group, using the KVM
> Compute role aka disk image.
> Or perhaps we should actually surface image all the way up:
> Image + Service group ?
> image = what things we build into the image
> service group = what runtime configuration we're giving, including how
> many machines we want in the group
How about just leaving it as Resource Class? The things you've
brainstormed about are in line with the original thinking around
the resource class concept.
role (assumes role specific image) +
service/resource grouping +
hardware that can provide that service/resource
More information about the OpenStack-dev