<div dir="ltr"><div dir="ltr"><div>Sorry, I missed the mailist address in the reply, there probably discussion and reply missed in the lastest email. So I reply to the mailist address with those reply, hope other people can catch up our discussion.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> 于2019年4月15日周一 下午9:54写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 2019-04-15 at 21:04 +0800, Alex Xu wrote:<br>

> Contribute an another idea at here. Pretty sure I didn't explore this with<br>

> all the cases by my limited vision.<br>

> <br>

> So I'm thinking we can continue use query string build a tree structure by<br>

> the request group number. I know the number request group problem for the<br>

> cyborg and neutron, but I think there must be some way to describe the<br>

> cyborg device will be attached to which instance numa node. So I guess that<br>

> it isn't the fault of number request group, maybe we are just missing a way<br>

> to describe that.<br>

> <br>

> For the case in the spec <a href="https://review.openstack.org/#/c/650476" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/650476</a>, an<br>

> instance with one numa node and two VFs from different network. We can<br>

> write as below:<br>

> <br>

> ?resources=DISK_GB:10&<br>

> resources1=VCPU:2,MEMORY_MB:128&<br>

> resources1.1=VF:1&required=NET_A<br>

> resources1.2=VF:1&required=NET_B<br>

im not sure what NET_A and NET_B correspond to<br>

as they are not prefixed with CUSTOM_ that implies they are standard<br>

traits but how woudl you map dynamically created neutron network to reouce providres as<br>

traits.<br>

<br>

i can see and have argued for doing something similar for neutron physnet<br>

as tehy are mostly static and can be applied by the neutron agent to the RP they create<br>

using a CUSTOM_PHYSNET_<physnet name> trait but i dont see how NET_A woudl work.<br></blockquote><div><br></div><div>Yes, it is CUSTOM_PHYSNET_NET_A/CUSTON_PHYSNET_NET_B, just use a simple version. But the case I want to show is two VFs from different physical network.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> <br>

> Another example, we request an instance with two numa nodes, 2 vcpus and<br>

> 128mb memory in each node. In each node has two VFs come from different PF<br>

> to have HA.<br>

> <br>

> ?resources=DISK_GB:10&<br>

> resources1=VCPU:2,MEMORY_MB:128&<br>

> resources1.1=VF:1&<br>

> resources1.2=VF:1&<br>

> resources2=VCPU:2,MEMORY_MB:128&<br>

> resources2.1=VF:1&<br>

> resources2.2=VF:1&<br>

> group_policy=isolate&<br>

> group_policy1=isolate&<br>

> group_policy2=isolate<br>

<br>

this gets messy as there is no way to express that i have a 2 numa node guest<br>

and i want a vf form either numa node without changing the grouping and group policies.<br>

<br></blockquote><div><br></div><div>It can be done by</div><div><div>GET /allocation_candidates?</div><div>resources=DISK_GB:10,VF:1&</div><div>resources1=VCPU:2,MEMORY_MB:128&</div><div>resources2=VCPU:2,MEMORY_MB:128&</div><div>group_policy=isolate</div></div><div><br></div><div>The DISK_GB and VF are in a un-numbered request group. So they may come from any RP in the tree.</div><div><br></div><div><a href="http://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/granular-resource-requests.html#semantics">http://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/granular-resource-requests.html#semantics</a><br></div><div>"The semantic for the (single) un-numbered grouping is unchanged. That is, it may still return results from different RPs in the same tree (or, when “shared” is fully implemented, the same aggregate)."</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

if we went down this road the we woudl have to generate this request dynamically(im ok with that)<br>

but that woudl mean the operator should not add resouce:... extra_spec to falvor ever.<br></blockquote><div><br></div><div>Yes, I prefer the way generate from extra spec, not asking the operator write such complex request by hand. The opearor can continue use 'resources' extra spec, we can merge it into the generate one.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

personally i would like to move in the direction of creating the placement queries dynamically<br>

and not requiring or allowing operators to specify resouce in the flavor as its the only way<br>

i can see to beable to generage a query like the one above. the main gap i see to enabling<br>

that is we have no numa infomation from neutron with regards to what numa node we shoudl<br>

attach the vf too so we cant create the request above without chaning the neutron api.<br></blockquote><div><br></div><div>Yes, we are one the same side. The neutron problem see below.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> <br>

> The `group_policy` ensure the resources1 and resources2 aren't coming from<br>

> the same RP. The 'group_poilcy1' ensures `resource1.x` aren't coming from<br>

> the same RP. The `group_policy2` ensures `resources2.x` aren't coming from<br>

> same RP.<br>

> <br>

> For the cyborg case, I think we can propose the flavor extra specs as below:<br>

> accel:device_profile.[numa node id]=<profile_name><br>

this i think could work short term but honestly i think we should not do this.<br>

in the long term we would want to allow the device_profile to be passed on the<br>

nova boot commandlline and manage qouta/billing of device outside of flavors.<br></blockquote><div><br></div><div>We can also allow specify guest numa node id in the boot command. But I want to say, the problem is</div><div>we miss a way to specify that info for the neutron and cyborg. Other proposal in the spec doesn't resolve this problem.</div><div>And I think this problem isn't the fault of request group number.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

we will also want to provide a policy attibe i think for virtual to host numa affitity<br>

for devices.<br>

<br>

the other asspect is we curently do not create a pci root complex per numa node<br>

until we do that  we cant support requesting cyborg device per numa node<br>

the numa node id in accel:device_profile.[numa node id]=<profile_name><br>

should be the guest numa node not a host numa node.<br></blockquote><div><br></div><div>Yes, The "[numa node id]" in "accel:device_profile.[numa node id]" is guest numa node id. Just like other extra spec "hw:cpus.0=1,2", we are using the guest numa node id in those extra specs.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

personally i woudl prefer to create the pci root complex per numa node first<br>

and automaticlly assign the device to the correct root complex before allowing<br>

enduser to request cyborge device to be attached to a specific guest numa node<br>

as i think accel:device_profile.[numa node id]=<profile_name>  might be too constiringing<br>

while also leaking to much host specific infomation via our api if it is used to select<br>

placment resouce providers and therefore host numa nodes.<br>

<br>

> <br>

> Then we will know the user hope the cyborg device being attach to which<br>

> instance numa node.<br>

> <br>

> The cyborg only needs to return un-numbered request group, then Nova will<br>

> base on all the 'hw:xxx' extra specs and 'accel:device_profile.[numa node<br>

> id]' to generate a placement request like above.<br>

> <br>

> For example, if it is PCI device under first numa node, the extra spec will<br>

> be 'accel:device_profile.0=<profile_name>' the cyborg can return a simple<br>

> request 'resources=CYBORG_PCI_XX_DEVICE:1', then we merge this into the<br>

> request group 'resources1=VCPU:2,MEMORY_MB:128,CYBORG_PCI_XX_DEVICE:1'. If<br>

> the pci device has a special trait, then cyborg should return request group<br>

> as 'resources1=CYBORG_PCI_XX_DEVICE:1&required=SOME_TRAIT', then nova merge<br>

> this into placement request as 'resources1.1'.<br>

> <br>

> Chris Dent <<a href="mailto:cdent%2Bos@anticdent.org" target="_blank">cdent+os@anticdent.org</a>> 于2019年4月9日周二 下午8:42写道：<br>

> <br>

> > <br>

> > Spec: <a href="https://review.openstack.org/650476" rel="noreferrer" target="_blank">https://review.openstack.org/650476</a><br>

> > <br>

> > From the commit message:<br>

> > <br>

> >      To support NUMA and similar concepts, this proposes the ability<br>

> >      to request resources from different providers nested under a<br>

> >      common subtree (below the root provider).<br>

> > <br>

> > There's much in the feature described by the spec and the surrounding<br>

> > context that is frequently a source of contention in the placement<br>

> > group, so working through this spec is probably going to require<br>

> > some robust discussion. Doing most of that before the PTG will help<br>

> > make sure we're not going in circles in person.k<br>

> > <br>

> > Some of the areas of potential contention:<br>

> > <br>

> > * Adequate for limited but maybe not all use case solutions<br>

> > * Strict trait constructionism<br>

> > * Evolving the complexity of placement solely for the satisfaction<br>

> >    of hardware representation in Nova<br>

> > * Inventory-less resource providers<br>

> > * Developing new features in placement before existing features are<br>

> >    fully used in client services<br>

> > * Others?<br>

> > <br>

> > I list this not because they are deal breakers or the only thing<br>

> > that matters, but because they have presented stumbling blocks in<br>

> > the past and we may as well work to address them (or make an<br>

> > agreement to punt them until later) otherwise there will be<br>

> > lingering dread.<br>

> > <br>

> > And, beyond all that squishy stuff, there is the necessary<br>

> > discussion over the solution described in the spec. There are<br>

> > several alternatives listed in the spec, and a few more in the<br>

> > comments. We'd like to figure out the best solution that can<br>

> > actually be done in a reasonable amount of time, not the best<br>

> > solution in the absolute.<br>

> > <br>

> > Discuss!<br>

> > <br>

> > --<br>

> > Chris Dent                       ٩◔̯◔۶           <a href="https://anticdent.org/" rel="noreferrer" target="_blank">https://anticdent.org/</a><br>

> > freenode: cdent                                         tw: @anticdent<br>

<br>

</blockquote></div></div></div>