The step I'm thinking is:
1. upgrade control plane, disable request PCPU, still request VCPU. 2. rolling upgrade compute node, compute nodes begin to report both PCPU and VCPU. But the request still add to VCPU. 3. enabling the PCPU request, the new request is request PCPU. In this point, some of instances are using VCPU, some of instances are using PCPU on same node. And the amount VCPU + PCPU will double the available cpu resources. The NUMATopology filter is responsible for stop over-consuming the total number of cpu. 4. rolling update compute node's configure to use cpu_dedicated_set, that trigger the reshape existed VCPU consuming to PCPU consuming. New request is going to PCPU at step3, no more VCPU request at this point. Roll upgrade node to get rid of existed VCPU consuming. 5. done
This had been my initial plan. The issue is that by reporting both PCPU and VCPU in (2), our compute node's resource provider will now have PCPU inventory available (though it won't be used). This is problematic since "does this resource provider have PCPU inventory" is one of the questions I need to ask to determine if I should do a reshape. If I can't rely on this heuristic, I need to start querying for allocation information (so I can ask "does this resource provider have PCPU *allocations*") every time I start a compute node. I'm guessing this is expensive, since we don't do it by default.
We already do it as part of update_available_resource via _remove_deleted_instances_allocations (there we're only checking the compute node RP, but in the future we'll have to do it for the whole tree anyway). We restricted it to the reshape path in _update_to_placement because it's not free and it was possible to make the flow work in the general case without it. We can still avoid it in the general case by only doing it when startup is True. So if you can solve the problem (which I'm still wrapping my brain around) by looking at the allocations, let's do that. Because...
I'm not quite ensure understand the problem. How about question you should ask is "Does the current amount of VCPU and PCPU is double of actual available cpu resources". If the answer is yes, then do a reshape.
Alex's suggestion makes sense to me, but it's a bit of a hack, and the math might break down if you e.g. stop compute, twiddle your cpu_*_setZ, and restart. efried .