[placement][nova][ptg] Resourceless trait filters

Alex Xu soulxu at gmail.com
Wed Apr 17 09:16:59 UTC 2019


Eric Fried <openstack at fried.cc> 于2019年4月17日周三 上午3:28写道:

> > I'm not sure I understand your proposal. Would you introduce a VM
> > resource and then allocate 1 of that resource for each VM?
>
> This has been proposed before, somewhere: translating
> max_instances_per_host to an inventory of resource class "VM" on the
> compute node RP, and including resources:VM=1 in every GET /a_c request.
>

Actually, I propose to attach traits to the RP which has the VCPU resource.
In the case, we have NUMA in placement. We will attach traits to the numa
node RP. I just try to explain why that may makes sense. Since the those
traits should be attached to the "Compute" resource, and the VCPU is just
that "Compute" resource.(yes, we have two numa nodes, then both two numa
nodes has those traits, but it should be fine). When we are asking those
traits, then we must asking the VCPU, right? If so, it sounds make sense.
Or is there any case we only request a trait, totally no resource
requesting? If yes, that may not works. But I think the case we begin to
discussion is about the trait and resource aren't in the same RP. For the
neutron bw case, we still attach nic type trait to the PF, not the agent,
for the same reason.


>
> This would solve the class of use cases like:
>
> >>  So we have a specific use case: COMPUTE_TRUSTED_CERTS + NUMA
>
> but wouldn't help us for:
>
> > So what we need to solve is. Two (or more) sets of resources where the
> > different sets requires different, contradicting traits, in a setup
> > where the trait is not on the RP where resource inventory is.
> >
> > compute RP
> > |
> > |
> > |____ OVS agent RP
> > |      |     * CUSTOM_VNIC_TYPE_NORMAL
> > |      |
> > |      |___________ br-int dev RP
> > |                       * CUSTOM_PHYSNET_PHYSNET0
> > |                       * NET_BW_EGR_KILOBIT_PER_SEC: 1000
> > |
> > |
> > |____ SRIOV agent RP
> > |      |     * CUSTOM_VNIC_TYPE_DIRECT
> > |      |
> > |      |
> > |      |___________ esn1 dev RP
> > |      |                * CUSTOM_PHYSNET_PHYSNET0
> > |      |                * NET_BW_EGR_KILOBIT_PER_SEC: 10000
> > |      |
> > |      |___________ esn2 dev RP
> > |                       * CUSTOM_PHYSNET_PHYSNET1
> > |                       * NET_BW_EGR_KILOBIT_PER_SEC: 20000
> >
> >
> > Then having two neutron ports in a server create request:
> > * port-normal:
> >     "resource_request": {
> >     "resources": {
> >             orc.NET_BW_EGR_KILOBIT_PER_SEC: 1000},
> >     "required": ["CUSTOM_PHYSNET0", "CUSTOM_VNIC_TYPE_NORMAL"]
> >
> > * port-direct:
> >     "resource_request": {
> >     "resources": {
> >             orc.NET_BW_EGR_KILOBIT_PER_SEC: 2000},
> >     "required": ["CUSTOM_PHYSNET0", "CUSTOM_VNIC_TYPE_DIRECT"]
>
> ...unless we contrive some inventory unit to put on the agent RPs. What
> would that be? VNIC? How would we know how many to create?
>
> Interestingly, the above is closely approaching the space we're
> exploring for "subtree affinity". I'm wondering if there's a Unified
> Solution...
>

yea, if we are going to create a virtual resource 'VM', then we need
'VNIC', and then we need more. I don't like that.


>
> > For example, if we said that "traits always flow down [4]" (the
> > phrase that entered my brain and got me to start this email, "down"
> > in this case is "in the direction of children") then some traits
> > could be on the compute node, but expressed in a numbered request
> > group if that happened to be more convenient.
> >
> > This mental model works well for me, because nested often represents
> > a _containing_ hierarchy [2].
> >
> > If the "compute RP has no resources to give [...] but it's still the
> thing
> > exposing traits we want to filter by" [3], if we make it so the children
> > inherit those traits (because they have flowed down and the children
> > are "inside" the thing) things feel a bit saner to me. Would be good
> > if Eric were able to express in more detail why inherit feels
> > "terrible" [3]. It could very well be.
>
> I also said "feels". I can't really explain it any better than I could
> explain why "using group numbers as values" gave me the ooks. And given
> we're coming up ugly with all other proposals, convince me that this one
> is practical and not fraught with peril and I'll quickly get over my
> discomfort. Right now I'm pretty close to that point because it
> elegantly solves both classes of problem described above, and I can't
> think of a way to break it that isn't ridiculously contrived.
>
> It's possible we punted on it before because a) we didn't have the
> concrete use cases we have now; and b) it was going to be pretty tricky
> to implement. More on that below.
>
> > Similarly, aggregate membership would flow down as well, because a
> > child is always in its parent's aggregate too because it is inside its
> > parent.
>
> This one I'm not so convinced about. Can we defer making changes here
> until we have similarly concrete use cases?
>
> > A numeric requiredN or member_ofN span would be capped by the resource
> > provider that satisfied resourcesN.
>
> Eh? I was following you up to this point. Do you just mean that we don't
> have to worry about ascending the tree looking for requiredN because the
> trait is implicitly on the provider with resourceN by virtue of being on
> its ancestor?
>
> > We need to work out a consistent and relatively easy to explain
> > mental model for this, because we need to be able to talk about it
> > with other people without them being required to re-experience all
> > the mental hurdles we are having to overcome.
>
> I think the hurdles are more around "why" and "are you sure you want to"
> - once we've made those decisions, IMO it can be understood fairly
> easily with one or both of "encapsulation" and "traits flow down" as
> you've explained them.
>
> > [4] A corollary could be "classes of inventory always flow up": If
> > you need a SRIOV_NET_VF, this root resource provider can provide it
> > because it has a great grandchild which has it.
>
> This one bakes my noodle pretty good. I have a harder time visualizing
> how the above use cases are satisfied by walking backwards up the tree
> accumulating resources (and you have to accumulate the traits as well,
> right?) until I hit a point where I've gathered everything I need.
>
> So I'll come down in favor of making "traits flow down" happen. Question
> is, how? (And I know we've talked about this before - maybe Queens+Denver?)
>
> (A) In the database.
>   (i) Any time a trait is added to a provider, we create records for
> same trait for all descendants.
>   (ii) Need a data migration to bring existing data into conformance with ^
>   (iii) When a trait is deleted from a provider, I assume we need to
> recursively delete it from all descendants. If you didn't want that,
> you'd have to go back and re-add it to the descendants you wanted it on.
>
> Pros: Easy to do. We don't have to change any of the APIs' algorithms -
> they just work the way we want them to by virtue of the trait data being
> where we want it. Reporting (e.g. GET /rps and therefore CLI output)
> reflects "reality".
> Cons: Irreversible. Not backward compatible. Can't do it in a microversion.
>
> (B) In the algorithms.
>   (i) GET /rps and GET /a_cs queries need JOINs I can't even begin to
> comprehend.
>   (ii) Do we tweak the outputs (GET /rps response and GET /a_cs
> provider_summaries) to report the "inherited" traits as well?
>
> Pros: Can do it in a microversion.
> Cons: See "can't even begin to comprehend". Maybe I'm a dunce.
>
> Perhaps this suggests a hybrid approach:
>
> (C) Create a "ghost" table of inherited resource provider traits. If
> $old_microversion we ignore it; if $new_microversion we logically
> combine it with the existing rp traits table in all our queries.
>
> Thoughts?
>
> efried
> .
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190417/a0afd957/attachment-0001.html>


More information about the openstack-discuss mailing list