<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Stephen Finucane <<a href="mailto:sfinucan@redhat.com">sfinucan@redhat.com</a>> 于2019年6月17日周一 下午8:47写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 2019-06-17 at 17:47 +0800, Alex Xu wrote:<br>

> Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>> 于2019年6月17日周一 下午5:19写道：<br>

> > On Mon, 2019-06-17 at 16:45 +0800, Alex Xu wrote:<br>

> > > I'm thinking we should have recommended upgrade follow. If we <br>

> > > give a lot of flexibility for the operator to have a lot <br>

> > > combination of the value of vcpu_pin_set, dedicated_cpu_set and <br>

> > > shared_cpu_set, then we have trouble in this email and have to do<br>

> > > a lot of checks this email introduced also.<br>

> ><br>

> > we modified the spec intentionally to make upgradeing simple.<br>

> > i don't be believe the concerns raised in the intial 2 emails are<br>

> > valid if we follow what was detailed in the spec. <br>

> > we did take some steps to restrict what values you can set.<br>

> > for example dedicated_cpu_set cannot be set if vcpu pin set is set.<br>

> > technicall i belive we relaxed that to say we would ignore vcpu pin<br>

> > set in that case be original i was pushing for it to be a hard<br>

> > error.<br>

> > <br>

> > > I'm thinking that the pre-request filter (which translates the<br>

> > > cpu_policy=dedicated to PCPU request) should be enabled after all<br>

> > > the node upgrades to the Train release. Before that, all the <br>

> > > cpu_policy=dedicated instance still using the VCPU.<br>

> ><br>

> > it should be enabled after all node are upgraded but not<br>

> > nessisarily before all compute nodes are updated to use<br>

> > dedicated_cpu_set.<br>

> <br>

> If we enable the pre-request filter in the middle of upgrade, there<br>

> will have the problem Bhagyashri said. Reporting PCPU and VCPU<br>

> sametime doesn't resolve the concern from him as my understand.<br>

> <br>

> For example, we have 100 nodes for dedicated host in the cluster.<br>

> <br>

> The operator begins to upgrade the cluster. The controller plane<br>

> upgrade first, and the pre-request filter enabled.<br>

> For rolling upgrade, he begins to upgrade 10 nodes first. Then only<br>

> those 10 nodes report PCPU and VCPU sametime.<br>

> But any new request with dedicated cpu policy begins to request PCPU,<br>

> all of those new instance only can be go to those 10 nodes. Also<br>

> if the existed instances execute the resize and evacuate, and<br>

> shelve/unshelve are going to those 10 nodes also. That is kind of<br>

> nervious on the capacity at that time.<br>

<br>

The exact same issue can happen the other way around. As an operator<br>

slowly starts upgrading, by setting the necessary configuration<br>

options, the compute nodes will reduce the VCPU inventory they report<br>

and start reporting PCPU inventory. Using the above example, if we<br>

upgraded 90 of the 100 compute nodes and didn't enable the prefilter,<br>

we would only be able to schedule to one of the remaining 10 nodes.<br>

This doesn't seem any better.<br>

<br>

At some point we're going to need to make a clean break from pinned<br>

instances consuming VCPU resources to them using PCPU resources. When<br>

that happens is up to us. I figured it was easiest to do this as soon<br>

as the controllers were updated because I had assumed compute nodes<br>

would be updated pretty soon after the controllers and therefore there<br>

would only be a short window where instances would start requesting<br>

PCPU resources but there wouldn't be any available. Maybe that doesn't<br>

make sense though. If not, I guess we need to make this configurable.<br>

<br>

I propose that as soon as compute nodes are upgraded then they will all<br>

start reporting PCPU inventory, as noted in the spec. However, the<br>

prefilter will initially be disabled and we will not reshape existing<br>

inventories. This means pinned instances will continue consuming VCPU<br>

resources as before but that is not an issue since this is the behavior<br>

we currently have. Once the operator is happy that all of the compute<br>

nodes have been upgraded, or at least enough that they care about, we<br>

will then need some way for us to switch on the prefilter and reshape<br>

existing instances. Perhaps this would require manual configuration<br>

changes, validated by an upgrade check, or perhaps we could add a<br>

workaround config option?<br>

<br>

In any case, at some point we need to have a switch from "use VCPUs for<br>

pinned instances" to "use PCPUs for pinned instances".<br></blockquote><div><br></div><div>All agree, we are talking about the same thing. This is the upgrade step I write below. </div><div>I didn't see the spec describe those steps clearly or I miss something.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Stephen<br>

<br>

> > > Trying to image the upgrade as below:<br>

> > > <br>

> > > 1. Rolling upgrade the compute node.<br>

> > > 2. The upgraded compute node begins to report both VCPU and PCPU,<br>

> > > but reshape for the existed inventories.<br>

> > >      The upgraded node is still using the vcpu_pin_set config, or<br>

> > > didn't set the vcpu_pin_config. Both in this two cases are <br>

> > > reporting VCPU and PCPU same time. And the request with <br>

> > > cpu_policy=dedicated still uses the VCPU.<br>

> > > Then it is worked same as Stein release. And existed instance can<br>

> > > be shelved/unshelved, migration and evacuate.<br>

> ><br>

> > +1<br>

> ><br>

> > > 3. Disable the new request and operation for the instance to the <br>

> > > hosts for dedicated instance. (it is kind of breaking our live-<br>

> > > upgrade? I thought this will be a short interrupt for the control<br>

> > > plane if that is available)<br>

> ><br>

> > im not sure why we need to do this unless you are thinging this<br>

> > will be done by a cli? e.g. like nova-manage.<br>

> <br>

> The inventories of existed instance still consumes VCPU. As we know<br>

> the PCPU and VCPU reporting same time, that is kind of duplicated<br>

> resources. If we begin to consume the PCPU, in the end, it will over<br>

> consume the resource.<br>

> <br>

> yes, the disable request is done by CLI, probably disable the<br>

> service.<br>

>  <br>

> > > 4. reshape the inventories for existed instance for all the <br>

> > > hosts.<br>

> ><br>

> > should this not happen when the agent starts up?<br>

> ><br>

> > > 5. Enable the instance's new request and operation, also enable <br>

> > > the pre-request filter.<br>

> > > 6. Operator copies the value of vcpu_pin_set to <br>

> > > dedicated_cpu_set.<br>

> ><br>

> > vcpu_pin_set is not the set of cpu used for pinning. the operators<br>

> > should set dedicated_cpu_set and shared_cpu_set approprealy at this<br>

> > point but in general they proably wont just copy it as host that<br>

> > used vcpu_pin_set but were not used for pinned instances will be<br>

> > copied to shared_cpu_set.<br>

> <br>

> Yes, I should say this upgrade flow is for those dedicated instance<br>

> host. For the host only running floating instance, they doesn't have<br>

> trouble with those problem.<br>

>  <br>

> > > For the case of vcpu_pin_set isn't set, the value of <br>

> > > dedicated_cpu_set should be all the cpu ids exclude <br>

> > > shared_cpu_set if set.<br>

> > > <br>

> > > Two rules at here:<br>

> > > 1. The operator doesn't allow to change a different value for <br>

> > > dedicated_cpu_set with vcpu_pin_set when any instance is running <br>

> > > on the host.<br>

> > > 2. The operator doesn't allow to change the value of <br>

> > > dedicated_cpu_set and shared_cpu_set when any instance is running<br>

> > > on the host.<br>

> > <br>

> > neither of these rule can be enforced. one of the requirements that dan smith had<br>

> > for edge computeing is that we need to supprot upgraes with instance inplace.<br>

<br>

<br>

</blockquote></div></div>