<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> 于2019年6月17日周一 下午5:19写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote:<br>

> I'm thinking we should have recommended upgrade follow. If we give a lot of<br>

> flexibility for the operator to have a lot combination of the value of<br>

> vcpu_pin_set, dedicated_cpu_set and shared_cpu_set, then we have trouble in<br>

> this email and have to do a lot of checks this email introduced also.<br>

we modified the spec intentionally to make upgradeing simple.<br>

i don't be believe the concerns raised in the intial 2 emails are valid <br>

if we follow what was detailed in the spec. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

we did take some steps to restrict what values you can set.<br>

for example dedicated_cpu_set cannot be set if vcpu pin set is set.<br>

technicall i belive we relaxed that to say we would ignore vcpu pin set in that case<br>

be original i was pushing for it to be a hard error.<br>

<br>

> <br>

> I'm thinking that the pre-request filter (which translates the<br>

> cpu_policy=dedicated to PCPU request) should be enabled after all the node<br>

> upgrades to the Train release. Before that, all the cpu_policy=dedicated<br>

> instance still using the VCPU.<br>

it should be enabled after all node are upgraded but not nessisarily before<br>

all compute nodes are updated to use dedicated_cpu_set.<br></blockquote><div><br></div><div>If we enable the pre-request filter in the middle of upgrade, there will have the problem Bhagyashri said. Reporting PCPU and VCPU sametime doesn't resolve the concern from him as my understand.</div><div><br></div><div>For example, we have 100 nodes for dedicated host in the cluster.</div><div><br></div><div>The operator begins to upgrade the cluster. The controller plane upgrade first, and the pre-request filter enabled.</div><div>For rolling upgrade, he begins to upgrade 10 nodes first. Then only those 10 nodes report PCPU and VCPU sametime.</div><div>But any new request with dedicated cpu policy begins to request PCPU, all of those new instance only can be go to those 10 nodes. Also</div><div>if the existed instances execute the resize and evacuate, and shelve/unshelve are going to those 10 nodes also. That is kind of nervious on the capacity at that time.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

> <br>

> Trying to image the upgrade as below:<br>

> <br>

> 1. Rolling upgrade the compute node.<br>

> 2. The upgraded compute node begins to report both VCPU and PCPU, but<br>

> reshape for the existed inventories.<br>

>      The upgraded node is still using the vcpu_pin_set config, or didn't<br>

> set the vcpu_pin_config. Both in this two cases are reporting VCPU and PCPU<br>

> same time. And the request with cpu_policy=dedicated still uses the VCPU.<br>

> Then it is worked same as Stein release. And existed instance can be<br>

> shelved/unshelved, migration and evacuate.<br>

+1<br>

> 3. Disable the new request and operation for the instance to the hosts for<br>

> dedicated instance. (it is kind of breaking our live-upgrade? I thought<br>

> this will be a short interrupt for the control plane if that is available)<br>

im not sure why we need to do this unless you are thinging this will be<br>

done by a cli? e.g. like nova-manage.<br></blockquote><div><br></div><div>The inventories of existed instance still consumes VCPU. As we know the PCPU and VCPU reporting same time,</div><div>that is kind of duplicated resources. If we begin to consume the PCPU, in the end, it will over consume the resource.</div><div><br></div><div>yes, the disable request is done by CLI, probably disable the service.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> 4. reshape the inventories for existed instance for all the hosts.<br>

should this not happen when the agent starts up?<br>

> 5. Enable the instance's new request and operation, also enable the<br>

> pre-request filter.<br>

> 6. Operator copies the value of vcpu_pin_set to dedicated_cpu_set.<br>

vcpu_pin_set is not the set of cpu used for pinning.<br>

the operators should set dedicated_cpu_set and shared_cpu_set approprealy<br>

at this point but in general they proably wont just copy it as host that<br>

used vcpu_pin_set but were not used for pinned instances will be copied to<br>

shared_cpu_set.<br></blockquote><div><br></div><div>Yes, I should say this upgrade flow is for those dedicated instance host. For the host only running floating instance, they doesn't have trouble with those problem.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

>  For the<br>

> case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be<br>

> all the cpu ids exclude shared_cpu_set if set.<br>

> <br>

> Two rules at here:<br>

> 1. The operator doesn't allow to change a different value for<br>

> dedicated_cpu_set with vcpu_pin_set when any instance is running on the<br>

> host.<br>

> 2. The operator doesn't allow to change the value of dedicated_cpu_set and<br>

> shared_cpu_set when any instance is running on the host.<br>

neither of these rule can be enforced. one of the requirements that dan smith had<br>

for edge computeing is that we need to supprot upgraes with instance inplace.<br>

> <br>

> <br>

> <br>

> Shewale, Bhagyashri <<a href="mailto:Bhagyashri.Shewale@nttdata.com" target="_blank">Bhagyashri.Shewale@nttdata.com</a>> 于2019年6月14日周五 下午4:42写道：<br>

> <br>

> > > > that is incorrect both a and by will be returned. the spec states that<br>

> > <br>

> > for host A we report an inventory of 4 VCPUs and<br>

> > <br>

> > > > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so<br>

> > <br>

> > both host will be returned assuming<br>

> > <br>

> > > > $<no. of cpus> <=4<br>

> > <br>

> > <br>

> > Means if ``vcpu_pin_set`` is set in previous release then report both VCPU<br>

> > and PCPU as inventory (in Train) but this seems contradictory for example:<br>

> > <br>

> > <br>

> > On Stein,<br>

> > <br>

> > <br>

> > Configuration on compute node A:<br>

> > <br>

> > vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement database)<br>

> > <br>

> > <br>

> > On Train:<br>

> > <br>

> > vcpu_pin_set=0-3<br>

> > <br>

> > <br>

> > The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement db<br>

> > <br>

> > <br>

> > Now say user wants to create instances as below:<br>

> > <br>

> >    1. Flavor having extra specs (resources:PCPU=1), instance A<br>

> >    2. Flavor having extra specs (resources:VCPU=1), instance B<br>

> > <br>

> > <br>

> > For both instance requests, placement will return compute Node A.<br>

> > <br>

> > Instance A:  will be pinned to say 0 CPU<br>

> > <br>

> > Instance B:  will float on 0-3<br>

> > <br>

> > <br>

> > To resolve above issue, I think it’s possible to detect whether the<br>

> > compute node was configured to be used for pinned instances if<br>

> > ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case,<br>

> > vcpu_pin_set will be reported as PCPU otherwise VCPU.<br>

> > <br>

> > <br>

> > Regards,<br>

> > <br>

> > -Bhagyashri Shewale-<br>

> > <br>

> > ------------------------------<br>

> > *From:* Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>><br>

> > *Sent:* Thursday, June 13, 2019 8:32:02 PM<br>

> > *To:* Shewale, Bhagyashri; <a href="mailto:openstack-discuss@lists.openstack.org" target="_blank">openstack-discuss@lists.openstack.org</a>;<br>

> > openstack@fried.cc; <a href="mailto:sfinucan@redhat.com" target="_blank">sfinucan@redhat.com</a>; <a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a><br>

> > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking<br>

> > <br>

> > On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote:<br>

> > > Hi All,<br>

> > > <br>

> > > <br>

> > > Currently I am working on implementation of cpu pinning upgrade part as<br>

> > <br>

> > mentioned in the spec [1].<br>

> > > <br>

> > > <br>

> > > While implementing the scheduler pre-filter as mentioned in [1], I have<br>

> > <br>

> > encountered one big issue:<br>

> > > <br>

> > > <br>

> > > Proposed change in spec: In scheduler pre-filter we are going to alias<br>

> > <br>

> > request_spec.flavor.extra_spec and<br>

> > > request_spec.image.properties form ``hw:cpu_policy`` to<br>

> > <br>

> > ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances.<br>

> > > <br>

> > > <br>

> > > So when user will create a new instance  or execute instance actions<br>

> > <br>

> > like shelve, unshelve, resize, evacuate and<br>

> > > migration  post upgrade it will go through scheduler pre-filter which<br>

> > <br>

> > will set alias for `hw:cpu_policy` in<br>

> > > request_spec flavor ``extra specs`` and image metadata properties. In<br>

> > <br>

> > below particular case, it won’t work:-<br>

> > > <br>

> > > <br>

> > > For example:<br>

> > > <br>

> > > <br>

> > > I have two compute nodes say A and B:<br>

> > > <br>

> > > <br>

> > > On Stein:<br>

> > > <br>

> > > <br>

> > > Compute node A configurations:<br>

> > > <br>

> > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate<br>

> > <br>

> > which has “pinned” metadata)<br>

> > vcpu_pin_set does not mean that the host was used for pinned instances<br>

> > <a href="https://that.guru/blog/cpu-resources/" rel="noreferrer" target="_blank">https://that.guru/blog/cpu-resources/</a><br>

> > > <br>

> > > <br>

> > > Compute node B Configuration:<br>

> > > <br>

> > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate<br>

> > <br>

> > which has “pinned” metadata)<br>

> > > <br>

> > > <br>

> > > On Train, two possible scenarios:<br>

> > > <br>

> > > Compute node A configurations: (Consider the new cpu pinning<br>

> > <br>

> > implementation is merged into Train)<br>

> > > <br>

> > > vcpu_pin_set=0-3  (Keep same settings as in Stein)<br>

> > > <br>

> > > <br>

> > > Compute node B Configuration: (Consider the new cpu pinning<br>

> > <br>

> > implementation is merged into Train)<br>

> > > <br>

> > > cpu_dedicated_set=0-3 (change to the new config option)<br>

> > > <br>

> > >   1.  Consider that one instance say `test ` is created using flavor<br>

> > <br>

> > having old extra specs (hw:cpu_policy=dedicated,<br>

> > > "aggregate_instance_extra_specs:pinned": "true") in Stein release and<br>

> > <br>

> > now upgraded Nova to Train with the above<br>

> > > configuration.<br>

> > >   2.  Now when user will perform  instance action say shelve/unshelve<br>

> > <br>

> > scheduler pre-filter will change the<br>

> > > request_spec flavor extra spec from ``hw:cpu_policy`` to<br>

> > <br>

> > ``resources=PCPU:$<no. of cpus>``<br>

> > it wont remove hw:cpu_policy it will just change the resouces=VCPU:$<no.<br>

> > of cpus> ->   resources=PCPU:$<no. of cpus><br>

> > <br>

> > >  which ultimately will return only compute node B from placement service.<br>

> > <br>

> > that is incorrect both a and by will be returned. the spec states that for<br>

> > host A we report an inventory of 4 VCPUs and<br>

> > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so<br>

> > both host will be returned assuming<br>

> > $<no. of cpus> <=4<br>

> > <br>

> > >  Here, we expect it should have retuned both Compute A and Compute B.<br>

> > <br>

> > it will<br>

> > >   3.  If user creates a new instance using old extra specs<br>

> > <br>

> > (hw:cpu_policy=dedicated,<br>

> > > "aggregate_instance_extra_specs:pinned": "true") on Train release  with<br>

> > <br>

> > the above configuration then it will return<br>

> > > only compute node B from placement service where as it should have<br>

> > <br>

> > returned both compute Node A and B.<br>

> > that is what would have happend in the stien version of the spec and we<br>

> > changed the spec specifically to ensure that<br>

> > that wont happen. in the train version of the spec you will get both host<br>

> > as candates to prevent this upgrade impact.<br>

> > > <br>

> > > Problem: As Compute node A is still configured to be used to boot<br>

> > <br>

> > instances with dedicated CPUs same behavior as<br>

> > > Stein, it will not be returned by placement service due to the changes<br>

> > <br>

> > in the scheduler pre-filter logic.<br>

> > > <br>

> > > <br>

> > > Propose changes:<br>

> > > <br>

> > > <br>

> > > Earlier in the spec [2]: The online data migration was proposed to<br>

> > <br>

> > change flavor extra specs and image metadata<br>

> > > properties of request_spec and instance object. Based on the instance<br>

> > <br>

> > host, we can get the NumaTopology of the host<br>

> > > which will contain the new configuration options set on the compute<br>

> > <br>

> > host. Based on the NumaTopology of host, we can<br>

> > > change instance and request_spec flavor extra specs.<br>

> > > <br>

> > >   1.  Remove cpu_policy from extra specs<br>

> > >   2.  Add “resources:PCPU=<count>” in extra specs<br>

> > > <br>

> > > <br>

> > > We can also change the flavor extra specs and image metadata properties<br>

> > <br>

> > of instance and request_spec object using the<br>

> > > reshape functionality.<br>

> > > <br>

> > > <br>

> > > Please give us your feedback on the proposed solution so that we can<br>

> > <br>

> > update specs accordingly.<br>

> > i am fairly stongly opposed to useing an online data migration to modify<br>

> > the request spec to reflect the host they<br>

> > landed on. this speficic problem is why the spec was changed in the train<br>

> > cycle to report dual inventoryis of VCPU and<br>

> > PCPU if vcpu_pin_set is the only option set or of no options are set.<br>

> > > <br>

> > > <br>

> > > [1]:<br>

> > <br>

> > <a href="https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst@451" rel="noreferrer" target="_blank">https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst@451</a><br>

> > > <br>

> > > [2]:<br>

> > <br>

> > <a href="https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst" rel="noreferrer" target="_blank">https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst</a><br>

> > > <br>

> > > <br>

> > > Thanks and Regards,<br>

> > > <br>

> > > -Bhagyashri Shewale-<br>

> > > <br>

> > > Disclaimer: This email and any attachments are sent in strictest<br>

> > <br>

> > confidence for the sole use of the addressee and may<br>

> > > contain legally privileged, confidential, and proprietary data. If you<br>

> > <br>

> > are not the intended recipient, please advise<br>

> > > the sender by replying promptly to this email and then delete and<br>

> > <br>

> > destroy this email and any attachments without any<br>

> > > further use, copying or forwarding.<br>

> > <br>

> > Disclaimer: This email and any attachments are sent in strictest<br>

> > confidence for the sole use of the addressee and may contain legally<br>

> > privileged, confidential, and proprietary data. If you are not the intended<br>

> > recipient, please advise the sender by replying promptly to this email and<br>

> > then delete and destroy this email and any attachments without any further<br>

> > use, copying or forwarding.<br>

> > <br>

<br>

</blockquote></div></div>