[nova] Spec: Standardize CPU resource tracking

Alex Xu soulxu at gmail.com
Mon Jun 17 09:47:05 UTC 2019


Sean Mooney <smooney at redhat.com> 于2019年6月17日周一 下午5:19写道:

> On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote:
> > I'm thinking we should have recommended upgrade follow. If we give a lot
> of
> > flexibility for the operator to have a lot combination of the value of
> > vcpu_pin_set, dedicated_cpu_set and shared_cpu_set, then we have trouble
> in
> > this email and have to do a lot of checks this email introduced also.
> we modified the spec intentionally to make upgradeing simple.
> i don't be believe the concerns raised in the intial 2 emails are valid
> if we follow what was detailed in the spec.


> we did take some steps to restrict what values you can set.
> for example dedicated_cpu_set cannot be set if vcpu pin set is set.
> technicall i belive we relaxed that to say we would ignore vcpu pin set in
> that case
> be original i was pushing for it to be a hard error.
>
> >
> > I'm thinking that the pre-request filter (which translates the
> > cpu_policy=dedicated to PCPU request) should be enabled after all the
> node
> > upgrades to the Train release. Before that, all the cpu_policy=dedicated
> > instance still using the VCPU.
> it should be enabled after all node are upgraded but not nessisarily before
> all compute nodes are updated to use dedicated_cpu_set.
>

If we enable the pre-request filter in the middle of upgrade, there will
have the problem Bhagyashri said. Reporting PCPU and VCPU sametime doesn't
resolve the concern from him as my understand.

For example, we have 100 nodes for dedicated host in the cluster.

The operator begins to upgrade the cluster. The controller plane upgrade
first, and the pre-request filter enabled.
For rolling upgrade, he begins to upgrade 10 nodes first. Then only those
10 nodes report PCPU and VCPU sametime.
But any new request with dedicated cpu policy begins to request PCPU, all
of those new instance only can be go to those 10 nodes. Also
if the existed instances execute the resize and evacuate, and
shelve/unshelve are going to those 10 nodes also. That is kind of nervious
on the capacity at that time.



>
>
> >
> > Trying to image the upgrade as below:
> >
> > 1. Rolling upgrade the compute node.
> > 2. The upgraded compute node begins to report both VCPU and PCPU, but
> > reshape for the existed inventories.
> >      The upgraded node is still using the vcpu_pin_set config, or didn't
> > set the vcpu_pin_config. Both in this two cases are reporting VCPU and
> PCPU
> > same time. And the request with cpu_policy=dedicated still uses the VCPU.
> > Then it is worked same as Stein release. And existed instance can be
> > shelved/unshelved, migration and evacuate.
> +1
> > 3. Disable the new request and operation for the instance to the hosts
> for
> > dedicated instance. (it is kind of breaking our live-upgrade? I thought
> > this will be a short interrupt for the control plane if that is
> available)
> im not sure why we need to do this unless you are thinging this will be
> done by a cli? e.g. like nova-manage.
>

The inventories of existed instance still consumes VCPU. As we know the
PCPU and VCPU reporting same time,
that is kind of duplicated resources. If we begin to consume the PCPU, in
the end, it will over consume the resource.

yes, the disable request is done by CLI, probably disable the service.


> > 4. reshape the inventories for existed instance for all the hosts.
> should this not happen when the agent starts up?
> > 5. Enable the instance's new request and operation, also enable the
> > pre-request filter.
> > 6. Operator copies the value of vcpu_pin_set to dedicated_cpu_set.
> vcpu_pin_set is not the set of cpu used for pinning.
> the operators should set dedicated_cpu_set and shared_cpu_set approprealy
> at this point but in general they proably wont just copy it as host that
> used vcpu_pin_set but were not used for pinned instances will be copied to
> shared_cpu_set.
>

Yes, I should say this upgrade flow is for those dedicated instance host.
For the host only running floating instance, they doesn't have trouble with
those problem.


> >  For the
> > case of vcpu_pin_set isn't set, the value of dedicated_cpu_set should be
> > all the cpu ids exclude shared_cpu_set if set.
> >
> > Two rules at here:
> > 1. The operator doesn't allow to change a different value for
> > dedicated_cpu_set with vcpu_pin_set when any instance is running on the
> > host.
> > 2. The operator doesn't allow to change the value of dedicated_cpu_set
> and
> > shared_cpu_set when any instance is running on the host.
> neither of these rule can be enforced. one of the requirements that dan
> smith had
> for edge computeing is that we need to supprot upgraes with instance
> inplace.
> >
> >
> >
> > Shewale, Bhagyashri <Bhagyashri.Shewale at nttdata.com> 于2019年6月14日周五
> 下午4:42写道:
> >
> > > > > that is incorrect both a and by will be returned. the spec states
> that
> > >
> > > for host A we report an inventory of 4 VCPUs and
> > >
> > > > > an inventory of 4 PCPUs and host B will have 1 inventory of 4
> PCPUs so
> > >
> > > both host will be returned assuming
> > >
> > > > > $<no. of cpus> <=4
> > >
> > >
> > > Means if ``vcpu_pin_set`` is set in previous release then report both
> VCPU
> > > and PCPU as inventory (in Train) but this seems contradictory for
> example:
> > >
> > >
> > > On Stein,
> > >
> > >
> > > Configuration on compute node A:
> > >
> > > vcpu_pin_set=0-3 (This will report 4 VCPUs inventory in placement
> database)
> > >
> > >
> > > On Train:
> > >
> > > vcpu_pin_set=0-3
> > >
> > >
> > > The inventory will be reported as 4 VCPUs and 4 PCPUs in the placement
> db
> > >
> > >
> > > Now say user wants to create instances as below:
> > >
> > >    1. Flavor having extra specs (resources:PCPU=1), instance A
> > >    2. Flavor having extra specs (resources:VCPU=1), instance B
> > >
> > >
> > > For both instance requests, placement will return compute Node A.
> > >
> > > Instance A:  will be pinned to say 0 CPU
> > >
> > > Instance B:  will float on 0-3
> > >
> > >
> > > To resolve above issue, I think it’s possible to detect whether the
> > > compute node was configured to be used for pinned instances if
> > > ``NumaTopology`` ``pinned_cpus`` attribute is not empty. In that case,
> > > vcpu_pin_set will be reported as PCPU otherwise VCPU.
> > >
> > >
> > > Regards,
> > >
> > > -Bhagyashri Shewale-
> > >
> > > ------------------------------
> > > *From:* Sean Mooney <smooney at redhat.com>
> > > *Sent:* Thursday, June 13, 2019 8:32:02 PM
> > > *To:* Shewale, Bhagyashri; openstack-discuss at lists.openstack.org;
> > > openstack at fried.cc; sfinucan at redhat.com; jaypipes at gmail.com
> > > *Subject:* Re: [nova] Spec: Standardize CPU resource tracking
> > >
> > > On Wed, 2019-06-12 at 09:10 +0000, Shewale, Bhagyashri wrote:
> > > > Hi All,
> > > >
> > > >
> > > > Currently I am working on implementation of cpu pinning upgrade part
> as
> > >
> > > mentioned in the spec [1].
> > > >
> > > >
> > > > While implementing the scheduler pre-filter as mentioned in [1], I
> have
> > >
> > > encountered one big issue:
> > > >
> > > >
> > > > Proposed change in spec: In scheduler pre-filter we are going to
> alias
> > >
> > > request_spec.flavor.extra_spec and
> > > > request_spec.image.properties form ``hw:cpu_policy`` to
> > >
> > > ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances.
> > > >
> > > >
> > > > So when user will create a new instance  or execute instance actions
> > >
> > > like shelve, unshelve, resize, evacuate and
> > > > migration  post upgrade it will go through scheduler pre-filter which
> > >
> > > will set alias for `hw:cpu_policy` in
> > > > request_spec flavor ``extra specs`` and image metadata properties. In
> > >
> > > below particular case, it won’t work:-
> > > >
> > > >
> > > > For example:
> > > >
> > > >
> > > > I have two compute nodes say A and B:
> > > >
> > > >
> > > > On Stein:
> > > >
> > > >
> > > > Compute node A configurations:
> > > >
> > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in
> aggregate
> > >
> > > which has “pinned” metadata)
> > > vcpu_pin_set does not mean that the host was used for pinned instances
> > > https://that.guru/blog/cpu-resources/
> > > >
> > > >
> > > > Compute node B Configuration:
> > > >
> > > > vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in
> aggregate
> > >
> > > which has “pinned” metadata)
> > > >
> > > >
> > > > On Train, two possible scenarios:
> > > >
> > > > Compute node A configurations: (Consider the new cpu pinning
> > >
> > > implementation is merged into Train)
> > > >
> > > > vcpu_pin_set=0-3  (Keep same settings as in Stein)
> > > >
> > > >
> > > > Compute node B Configuration: (Consider the new cpu pinning
> > >
> > > implementation is merged into Train)
> > > >
> > > > cpu_dedicated_set=0-3 (change to the new config option)
> > > >
> > > >   1.  Consider that one instance say `test ` is created using flavor
> > >
> > > having old extra specs (hw:cpu_policy=dedicated,
> > > > "aggregate_instance_extra_specs:pinned": "true") in Stein release and
> > >
> > > now upgraded Nova to Train with the above
> > > > configuration.
> > > >   2.  Now when user will perform  instance action say shelve/unshelve
> > >
> > > scheduler pre-filter will change the
> > > > request_spec flavor extra spec from ``hw:cpu_policy`` to
> > >
> > > ``resources=PCPU:$<no. of cpus>``
> > > it wont remove hw:cpu_policy it will just change the
> resouces=VCPU:$<no.
> > > of cpus> ->   resources=PCPU:$<no. of cpus>
> > >
> > > >  which ultimately will return only compute node B from placement
> service.
> > >
> > > that is incorrect both a and by will be returned. the spec states that
> for
> > > host A we report an inventory of 4 VCPUs and
> > > an inventory of 4 PCPUs and host B will have 1 inventory of 4 PCPUs so
> > > both host will be returned assuming
> > > $<no. of cpus> <=4
> > >
> > > >  Here, we expect it should have retuned both Compute A and Compute B.
> > >
> > > it will
> > > >   3.  If user creates a new instance using old extra specs
> > >
> > > (hw:cpu_policy=dedicated,
> > > > "aggregate_instance_extra_specs:pinned": "true") on Train release
> with
> > >
> > > the above configuration then it will return
> > > > only compute node B from placement service where as it should have
> > >
> > > returned both compute Node A and B.
> > > that is what would have happend in the stien version of the spec and we
> > > changed the spec specifically to ensure that
> > > that wont happen. in the train version of the spec you will get both
> host
> > > as candates to prevent this upgrade impact.
> > > >
> > > > Problem: As Compute node A is still configured to be used to boot
> > >
> > > instances with dedicated CPUs same behavior as
> > > > Stein, it will not be returned by placement service due to the
> changes
> > >
> > > in the scheduler pre-filter logic.
> > > >
> > > >
> > > > Propose changes:
> > > >
> > > >
> > > > Earlier in the spec [2]: The online data migration was proposed to
> > >
> > > change flavor extra specs and image metadata
> > > > properties of request_spec and instance object. Based on the instance
> > >
> > > host, we can get the NumaTopology of the host
> > > > which will contain the new configuration options set on the compute
> > >
> > > host. Based on the NumaTopology of host, we can
> > > > change instance and request_spec flavor extra specs.
> > > >
> > > >   1.  Remove cpu_policy from extra specs
> > > >   2.  Add “resources:PCPU=<count>” in extra specs
> > > >
> > > >
> > > > We can also change the flavor extra specs and image metadata
> properties
> > >
> > > of instance and request_spec object using the
> > > > reshape functionality.
> > > >
> > > >
> > > > Please give us your feedback on the proposed solution so that we can
> > >
> > > update specs accordingly.
> > > i am fairly stongly opposed to useing an online data migration to
> modify
> > > the request spec to reflect the host they
> > > landed on. this speficic problem is why the spec was changed in the
> train
> > > cycle to report dual inventoryis of VCPU and
> > > PCPU if vcpu_pin_set is the only option set or of no options are set.
> > > >
> > > >
> > > > [1]:
> > >
> > >
> https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst@451
> > > >
> > > > [2]:
> > >
> > >
> https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resources.rst
> > > >
> > > >
> > > > Thanks and Regards,
> > > >
> > > > -Bhagyashri Shewale-
> > > >
> > > > Disclaimer: This email and any attachments are sent in strictest
> > >
> > > confidence for the sole use of the addressee and may
> > > > contain legally privileged, confidential, and proprietary data. If
> you
> > >
> > > are not the intended recipient, please advise
> > > > the sender by replying promptly to this email and then delete and
> > >
> > > destroy this email and any attachments without any
> > > > further use, copying or forwarding.
> > >
> > > Disclaimer: This email and any attachments are sent in strictest
> > > confidence for the sole use of the addressee and may contain legally
> > > privileged, confidential, and proprietary data. If you are not the
> intended
> > > recipient, please advise the sender by replying promptly to this email
> and
> > > then delete and destroy this email and any attachments without any
> further
> > > use, copying or forwarding.
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190617/f976afe7/attachment-0001.html>


More information about the openstack-discuss mailing list