[nova] Spec: Standardize CPU resource tracking

Alex Xu soulxu at gmail.com
Tue Jun 18 07:57:19 UTC 2019


Stephen Finucane <sfinucan at redhat.com> 于2019年6月17日周一 下午8:47写道:

> On Mon, 2019-06-17 at 17:47 +0800, Alex Xu wrote:
> > Sean Mooney <smooney at redhat.com> 于2019年6月17日周一 下午5:19写道:
> > > On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote:
> > > > I'm thinking we should have recommended upgrade follow. If we
> > > > give a lot of flexibility for the operator to have a lot
> > > > combination of the value of vcpu_pin_set, dedicated_cpu_set and
> > > > shared_cpu_set, then we have trouble in this email and have to do
> > > > a lot of checks this email introduced also.
> > >
> > > we modified the spec intentionally to make upgradeing simple.
> > > i don't be believe the concerns raised in the intial 2 emails are
> > > valid if we follow what was detailed in the spec.
> > > we did take some steps to restrict what values you can set.
> > > for example dedicated_cpu_set cannot be set if vcpu pin set is set.
> > > technicall i belive we relaxed that to say we would ignore vcpu pin
> > > set in that case be original i was pushing for it to be a hard
> > > error.
> > >
> > > > I'm thinking that the pre-request filter (which translates the
> > > > cpu_policy=dedicated to PCPU request) should be enabled after all
> > > > the node upgrades to the Train release. Before that, all the
> > > > cpu_policy=dedicated instance still using the VCPU.
> > >
> > > it should be enabled after all node are upgraded but not
> > > nessisarily before all compute nodes are updated to use
> > > dedicated_cpu_set.
> >
> > If we enable the pre-request filter in the middle of upgrade, there
> > will have the problem Bhagyashri said. Reporting PCPU and VCPU
> > sametime doesn't resolve the concern from him as my understand.
> >
> > For example, we have 100 nodes for dedicated host in the cluster.
> >
> > The operator begins to upgrade the cluster. The controller plane
> > upgrade first, and the pre-request filter enabled.
> > For rolling upgrade, he begins to upgrade 10 nodes first. Then only
> > those 10 nodes report PCPU and VCPU sametime.
> > But any new request with dedicated cpu policy begins to request PCPU,
> > all of those new instance only can be go to those 10 nodes. Also
> > if the existed instances execute the resize and evacuate, and
> > shelve/unshelve are going to those 10 nodes also. That is kind of
> > nervious on the capacity at that time.
>
> The exact same issue can happen the other way around. As an operator
> slowly starts upgrading, by setting the necessary configuration
> options, the compute nodes will reduce the VCPU inventory they report
> and start reporting PCPU inventory. Using the above example, if we
> upgraded 90 of the 100 compute nodes and didn't enable the prefilter,
> we would only be able to schedule to one of the remaining 10 nodes.
> This doesn't seem any better.
>
> At some point we're going to need to make a clean break from pinned
> instances consuming VCPU resources to them using PCPU resources. When
> that happens is up to us. I figured it was easiest to do this as soon
> as the controllers were updated because I had assumed compute nodes
> would be updated pretty soon after the controllers and therefore there
> would only be a short window where instances would start requesting
> PCPU resources but there wouldn't be any available. Maybe that doesn't
> make sense though. If not, I guess we need to make this configurable.
>
> I propose that as soon as compute nodes are upgraded then they will all
> start reporting PCPU inventory, as noted in the spec. However, the
> prefilter will initially be disabled and we will not reshape existing
> inventories. This means pinned instances will continue consuming VCPU
> resources as before but that is not an issue since this is the behavior
> we currently have. Once the operator is happy that all of the compute
> nodes have been upgraded, or at least enough that they care about, we
> will then need some way for us to switch on the prefilter and reshape
> existing instances. Perhaps this would require manual configuration
> changes, validated by an upgrade check, or perhaps we could add a
> workaround config option?
>
> In any case, at some point we need to have a switch from "use VCPUs for
> pinned instances" to "use PCPUs for pinned instances".
>

All agree, we are talking about the same thing. This is the upgrade step I
write below.
I didn't see the spec describe those steps clearly or I miss something.


> Stephen
>
> > > > Trying to image the upgrade as below:
> > > >
> > > > 1. Rolling upgrade the compute node.
> > > > 2. The upgraded compute node begins to report both VCPU and PCPU,
> > > > but reshape for the existed inventories.
> > > >      The upgraded node is still using the vcpu_pin_set config, or
> > > > didn't set the vcpu_pin_config. Both in this two cases are
> > > > reporting VCPU and PCPU same time. And the request with
> > > > cpu_policy=dedicated still uses the VCPU.
> > > > Then it is worked same as Stein release. And existed instance can
> > > > be shelved/unshelved, migration and evacuate.
> > >
> > > +1
> > >
> > > > 3. Disable the new request and operation for the instance to the
> > > > hosts for dedicated instance. (it is kind of breaking our live-
> > > > upgrade? I thought this will be a short interrupt for the control
> > > > plane if that is available)
> > >
> > > im not sure why we need to do this unless you are thinging this
> > > will be done by a cli? e.g. like nova-manage.
> >
> > The inventories of existed instance still consumes VCPU. As we know
> > the PCPU and VCPU reporting same time, that is kind of duplicated
> > resources. If we begin to consume the PCPU, in the end, it will over
> > consume the resource.
> >
> > yes, the disable request is done by CLI, probably disable the
> > service.
> >
> > > > 4. reshape the inventories for existed instance for all the
> > > > hosts.
> > >
> > > should this not happen when the agent starts up?
> > >
> > > > 5. Enable the instance's new request and operation, also enable
> > > > the pre-request filter.
> > > > 6. Operator copies the value of vcpu_pin_set to
> > > > dedicated_cpu_set.
> > >
> > > vcpu_pin_set is not the set of cpu used for pinning. the operators
> > > should set dedicated_cpu_set and shared_cpu_set approprealy at this
> > > point but in general they proably wont just copy it as host that
> > > used vcpu_pin_set but were not used for pinned instances will be
> > > copied to shared_cpu_set.
> >
> > Yes, I should say this upgrade flow is for those dedicated instance
> > host. For the host only running floating instance, they doesn't have
> > trouble with those problem.
> >
> > > > For the case of vcpu_pin_set isn't set, the value of
> > > > dedicated_cpu_set should be all the cpu ids exclude
> > > > shared_cpu_set if set.
> > > >
> > > > Two rules at here:
> > > > 1. The operator doesn't allow to change a different value for
> > > > dedicated_cpu_set with vcpu_pin_set when any instance is running
> > > > on the host.
> > > > 2. The operator doesn't allow to change the value of
> > > > dedicated_cpu_set and shared_cpu_set when any instance is running
> > > > on the host.
> > >
> > > neither of these rule can be enforced. one of the requirements that
> dan smith had
> > > for edge computeing is that we need to supprot upgraes with instance
> inplace.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190618/8d6d083a/attachment-0001.html>


More information about the openstack-discuss mailing list