Re: [nova] Spec: Standardize CPU resource tracking

13 Jun 2019


      ...
Hi All,
After revisiting the spec [1] again and again, I got to know few points please check and let me know about my
understanding:
Understanding:  If the ``vcpu_pin_set`` is set on compute node A in the Stein release then we can say that this  node
is used to host the dedicated instance on it and if user upgrades from Stein to Train and if operator doesn’t define
``[compute] cpu_dedicated_set`` set then simply fallback to ``vcpu_pin_set`` and report it as PCPU inventory.
On Thu, 2019-06-13 at 04:42 +0000, Shewale, Bhagyashri wrote:
that is incorrect
if the vcpu_pin_set is defiend it may be used for instance with hw:cpu_policy=dedicated or not.
in train if vcpu_pin_set is defiend and cpu_dedicated_set is not  defiend then we use vcpu_pin_set to define the
inventory of both PCPUs and VCPUs
...
Considering multiple combinations of various configuration options, I think we will need to implement below business
rules so that the issue highlighted in the previous email about the scheduler pre-filter can be solved.
Rule 1:
If operator sets ``[compute] cpu_shared_set`` in Train.
1.If pinned instances are found then we can simply say that this compute node is used as dedicated in the previous
release so raise an error that says to set ``[compute] cpu_dedicated_set`` config option otherwise report it as VCPU
inventory.
cpu_share_set in stien was used for vm emulator thread and required the instnace to be pinned for it to take effect.
i.e. the hw:emulator_thread_policy extra spcec currently only works if you had hw_cpu_policy=dedicated.
so we should not error if vcpu_pin_set and cpu_shared_set are defined, it was valid. what we can do is
ignore teh cpu_shared_set for schduling and not report 0 VCPUs for this host and use vcpu_pinned_set as PCPUs
...
Rule 2:
If operator sets ``[compute] cpu_dedicated_set``  in Train.
1. Report inventory as PCPU
yes if cpu_dedicated_set is set we will report its value as PCPUs
...
2. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is not empty, that means this
compute node is used as dedicated in the previous release and if empty, then raise an error that this compute node is
used as shared compute node in previous release.
this was not part of the spec.
we could do this but i think its not needed and operators should check this themselves.
if we decide to do this check on startup it should only happen if vcpu_pin_set is defined.
addtionally we can log an error but we should not prevent the compute node form working and contuing to
spawn vms.
...
Rule 3:
If operator sets None of the options (``[compute] cpu_dedicated_set``, ``[compute] cpu_shared_set``,
``vcpu_pin_set``)  in Train.
1. If instances are found, check for host numa topology pinned_cpus, if  pinned_cpus is not empty, then raise an error
that this compute node is used as dedicated compute node in previous release so set ``[compute] cpu_dedicated_set``,
otherwise report inventory as VCPU.
again this is not in the spec and i dont think we should do this.
if none of the values are set we should report all cpus as both VCPUs and PCPUs

the vcpu_pin_set option was never intended to signal a host was used for cpu pinning it was intoduced
for cpu pinning and numa affinity but it was orignally ment to apply to floaing instance and currently
contople the number of VCPU reported to the resouce tracker which is used to set the capastiy of the VCPU inventory.
you should read https://that.guru/blog/cpu-resources/ for a walk through of this.
...
2. If no instances, report inventory as VCPU.
we could do this but i think it will be confusing as to what will
happen after we spawn an instnace on the host in train.
i dont think this logic should be condtional on the presence of vms.
...
Rule 4:
If operator sets ``vcpu_pin_set`` config option in Train.
1. If instances are found, check for host numa topology pinned_cpus, if pinned_cpus is empty, that means this compute
node is used for non-pinned instances in the previous release, so raise an error otherwise report it as PCPU
inventory.
agin this is not in the spec.
what the spec says for if vcpu_pin_set is defiend is we will report inventorys of both VCPU and PCPUs for all cpus in
the vcpu_pin_set
...
2. If no instances, report inventory as PCPU.
again this should not be condtional on the presence of vms.
...
Rule 5:
If operator sets ``vcpu_pin_set`` and ``[compute] cpu_dedicated_set`` or ``[compute] cpu_shared_set`` config options
in Train
1. Simply raise an error
this is the only case were we "rasise" and error and refuse to start the compute node.
...
Above business rules 3 and 4 are very important in order to solve the scheduler pre-filter issue highlighted in my
previous email.
we explctly do not want to have the behavior in 3 and 4 specificly the logic of checking the instances.
...
As of today, in either case, `vcpu_pin_set``  is set or not set on the compute node, it can used for both pinned or
non-pinned instances depending on whether this host belongs to an aggregate with “pinned” metadata. But as per
business rule #3 , if  ``vcpu_pin_set`` is not set,  we are considering it to be used for non-pinned instances
only.  Do you think this could cause an issue in providing backward compatibility?
yes the rule you have listed above will cause issue for upgrades and we rejected similar rules in the spec.
i have not read your previous email which ill look at next but we spent a long time debating how this should
work in the spec design and i would prefer to stick to what the spec currently states.
...
Please provide your suggestions on the above business rules.
[1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources....
Thanks and Regards,
-Bhagyashri Shewale-
________________________________
From: Shewale, Bhagyashri
Sent: Wednesday, June 12, 2019 6:10:04 PM
To: openstack-discuss@lists.openstack.org; openstack@fried.cc; smooney@redhat.com; sfinucan@redhat.com; 
jaypipes@gmail.com
Subject: [nova] Spec: Standardize CPU resource tracking
Hi All,
Currently I am working on implementation of cpu pinning upgrade part as mentioned in the spec [1].
While implementing the scheduler pre-filter as mentioned in [1], I have encountered one big issue:
Proposed change in spec: In scheduler pre-filter we are going to alias request_spec.flavor.extra_spec and
request_spec.image.properties form ``hw:cpu_policy`` to ``resources=(V|P)CPU:${flavor.vcpus}`` of existing instances.
So when user will create a new instance  or execute instance actions like shelve, unshelve, resize, evacuate and
migration  post upgrade it will go through scheduler pre-filter which will set alias for `hw:cpu_policy` in
request_spec flavor ``extra specs`` and image metadata properties. In below particular case, it won’t work:-
For example:
I have two compute nodes say A and B:
On Stein:
Compute node A configurations:
vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata)
Compute node B Configuration:
vcpu_pin_set=0-3 (used as dedicated CPU, This host is added in aggregate which has “pinned” metadata)
On Train, two possible scenarios:
Compute node A configurations: (Consider the new cpu pinning implementation is merged into Train)
vcpu_pin_set=0-3  (Keep same settings as in Stein)
Compute node B Configuration: (Consider the new cpu pinning implementation is merged into Train)
cpu_dedicated_set=0-3 (change to the new config option)
1.  Consider that one instance say `test ` is created using flavor having old extra specs (hw:cpu_policy=dedicated,
"aggregate_instance_extra_specs:pinned": "true") in Stein release and now upgraded Nova to Train with the above
configuration.
  2.  Now when user will perform  instance action say shelve/unshelve scheduler pre-filter will change the
request_spec flavor extra spec from ``hw:cpu_policy`` to ``resources=PCPU:$<no. of cpus>`` which ultimately will
return only compute node B from placement service. Here, we expect it should have retuned both Compute A and Compute
B.
  3.  If user creates a new instance using old extra specs (hw:cpu_policy=dedicated,
"aggregate_instance_extra_specs:pinned": "true") on Train release  with the above configuration then it will return
only compute node B from placement service where as it should have returned both compute Node A and B.
Problem: As Compute node A is still configured to be used to boot instances with dedicated CPUs same behavior as
Stein, it will not be returned by placement service due to the changes in the scheduler pre-filter logic.
Propose changes:
Earlier in the spec [2]: The online data migration was proposed to change flavor extra specs and image metadata
properties of request_spec and instance object. Based on the instance host, we can get the NumaTopology of the host
which will contain the new configuration options set on the compute host. Based on the NumaTopology of host, we can
change instance and request_spec flavor extra specs.
1.  Remove cpu_policy from extra specs
  2.  Add “resources:PCPU=<count>” in extra specs
We can also change the flavor extra specs and image metadata properties of instance and request_spec object using the
reshape functionality.
Please give us your feedback on the proposed solution so that we can update specs accordingly.
[1]: https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources....
[2]: https://review.opendev.org/#/c/555081/23..28/specs/train/approved/cpu-resour...
Thanks and Regards,
-Bhagyashri Shewale-
Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may
contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise
the sender by replying promptly to this email and then delete and destroy this email and any attachments without any
further use, copying or forwarding.

Re: [nova] Spec: Standardize CPU resource tracking

Sean Mooney