[nova][ptg] pinned and unpinned CPUs in one instance
spec: https://review.opendev.org/668656 Agreements from the PTG: How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have Naming: use the 'shared' and 'dedicated' terminology Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax. Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance. TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host Cheers, gibi
On Fri, 2019-11-08 at 07:09 +0000, Balázs Gibizer wrote:
spec: https://review.opendev.org/668656
Agreements from the PTG:
How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. we can do realtime testing in that ci. i already did. also there is a new label that is available across 3 providers so we wont just be relying on donyd's good work.
* coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have
Naming: use the 'shared' and 'dedicated' terminology didn't we want to have a hw:cpu_policy=mixed specificaly for this case?
Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax.
Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance.
TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host in the context of mix if we dont enable numa affinity by default we should remove that behavior from all case where we do it today. NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host if this is a larger question of if we should have all instance be numa by default i have argued yes for quite a while as i think having 1 code path has many advantages. that said im aware of this limitation. one way to solve this was the use of the proposed can_split placmenent paramter. so if you did not specify a numa toplogy we would add can_split=vCPUs and then create a singel or multiple numa node toplogy based on the allcoations. if we combine that with a allocation weigher we could sort the allocation candiates by smallest number of numa nodes so we would prefer landing on hosts that can fit it on 1 numa node. its a big change but long overdue.
that said i have also argued the other point too in responce to pushback on "all vms have numa of 1 unless you say otherwise" i.e. that the 1:1 between mapping virtual and host numa nodes shoudl be configurable and is not required by the api today. the backwards compatible way to do that is its not requried by default if you are using shared cores and is required if you are using pinned but that is a littel confusing. i dont really know what the right answer to this is but i think its a seperate question form the topic of this thread. we dont need to solve this to enable pinned and unpinned cpus in one instance but we do need to adress this before we can model numa in placment.
Cheers, gibi
-----Original Message----- From: Sean Mooney <smooney@redhat.com> Sent: Friday, November 8, 2019 8:21 PM To: Balázs Gibizer <balazs.gibizer@est.tech>; openstack-discuss <openstack- discuss@lists.openstack.org> Subject: Re: [nova][ptg] pinned and unpinned CPUs in one instance
On Fri, 2019-11-08 at 07:09 +0000, Balázs Gibizer wrote:
spec: https://review.opendev.org/668656
Agreements from the PTG:
How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. we can do realtime testing in that ci. i already did. also there is a new label that is available across 3 providers so we wont just be relying on donyd's good work.
* coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have
Naming: use the 'shared' and 'dedicated' terminology didn't we want to have a hw:cpu_policy=mixed specificaly for this case?
Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax.
Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance.
TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host in the context of mix if we dont enable numa affinity by default we should remove that behavior from all case where we do it today. NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host
Hi gabi or sean, To help me to understand the issue under discussion, if I change the instance requirement a little bit to: -. an instance demanding 1 dedicated core and 39 shared cores -. instance vcpu allocation ratio is 1 -. host has 2 NUMA nodes and 40 cores in total -. 39 of 40 cores are registered as VCPU resource the 1 core is registered as PCPU It will raise the same problem, right? because it hopes the instance to be scheduled on the host.
if this is a larger question of if we should have all instance be numa by default i have argued yes for quite a while as i think having 1 code path has many advantages. that said im aware of this limitation. one way to solve this was the use of the proposed can_split placmenent paramter. so if you did not specify a numa toplogy we would add can_split=vCPUs and then create a singel or multiple numa node toplogy based on the allcoations. if we combine that with a allocation weigher we could sort the allocation candiates by smallest number of numa nodes so we would prefer landing on hosts that can fit it on 1 numa node. its a big change but long overdue.
I have read the 'can_split' spec, it will help if I understand the issue correctly. Then I agree with Sean that it is another issue that is not belong to spec 668656.
that said i have also argued the other point too in responce to pushback on "all vms have numa of 1 unless you say otherwise" i.e. that the 1:1 between mapping virtual and host numa nodes shoudl be configurable and is not required by the api today. the backwards compatible way to do that is its not requried by default if you are using shared cores and is required if you are using pinned but that is a littel confusing.
i dont really know what the right answer to this is but i think its a seperate question form the topic of this thread. we dont need to solve this to enable pinned and unpinned cpus in one instance but we do need to adress this before we can model numa in placment.
Cheers, gibi
On Fri, 2019-11-08 at 12:20 +0000, Sean Mooney wrote:
Naming: use the 'shared' and 'dedicated' terminology didn't we want to have a hw:cpu_policy=mixed specificaly for this case?
It wasn't clear, but gibi was referring to how we'd distinguish the "types" of CPU and instances using those CPUs. The alternative was pinned and unpinned. Stephen
-----Original Message----- From: Balázs Gibizer <balazs.gibizer@est.tech> Sent: Friday, November 8, 2019 3:10 PM To: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: [nova][ptg] pinned and unpinned CPUs in one instance
spec: https://review.opendev.org/668656
Agreements from the PTG:
How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have
Naming: use the 'shared' and 'dedicated' terminology
Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax.
Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance.
TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host
For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no assumption about instance NUMA topology. By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus through 'conf.compute.cpu_shared_set'. For instance with 'mixed' policy, what I want to propose is the instance should demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' instance will not be scheduled one this host due to no PCPU available on this host. And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. a 'mixed' instance demanding all cores from PCPU resource should be considered as an invalid one. And an instance demanding all cores from PCPU resource is just a legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'. In conclusion, a instance with the policy of 'mixed' -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. -. with NUMA topology by default due to requesting pinned cpu In my understanding the cons does not exist by making above rules. Br Huaqiang
Cheers, gibi
On Mon, 2019-11-11 at 11:58 +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Balázs Gibizer <balazs.gibizer@est.tech> Sent: Friday, November 8, 2019 3:10 PM To: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: [nova][ptg] pinned and unpinned CPUs in one instance
spec: https://review.opendev.org/668656
Agreements from the PTG:
How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have
Naming: use the 'shared' and 'dedicated' terminology
Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax.
Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance.
TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host
For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no assumption about instance NUMA topology.
Correct. However, this investigation refers to *all* instances, not just those using the 'mixed' policy. For the 'mixed' policy, I assume we'll need to apply a virtual NUMA topology since we currently apply one for instances using the 'dedicated' policy.
By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus through 'conf.compute.cpu_shared_set'.
For instance with 'mixed' policy, what I want to propose is the instance should demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' instance will not be scheduled one this host due to no PCPU available on this host.
And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. a 'mixed' instance demanding all cores from PCPU resource should be considered as an invalid one. And an instance demanding all cores from PCPU resource is just a legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'.
In conclusion, a instance with the policy of 'mixed' -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. -. with NUMA topology by default due to requesting pinned cpu
In my understanding the cons does not exist by making above rules.
Br Huaqiang
Cheers, gibi
On Thu, 2019-11-14 at 09:08 +0000, Stephen Finucane wrote:
On Mon, 2019-11-11 at 11:58 +0000, Wang, Huaqiang wrote:
-----Original Message----- From: Balázs Gibizer <balazs.gibizer@est.tech> Sent: Friday, November 8, 2019 3:10 PM To: openstack-discuss <openstack-discuss@lists.openstack.org> Subject: [nova][ptg] pinned and unpinned CPUs in one instance
spec: https://review.opendev.org/668656
Agreements from the PTG:
How we will test it: * do functional test with libvirt driver, like the pinned cpu tests we have today * donyd's CI supports nested virt so we can do pinned cpu testing but not realtime. As this CI is still work in progress we should not block on this. * coverage inhttps://opendev.org/x/whitebox-tempest-pluginis a nice to have
Naming: use the 'shared' and 'dedicated' terminology
Support both the hw:pinvcpus=3 and the resources:PCPU=2 flavor extra specs syntaxtbut not in the same flavor. The resources:PCPU=2 syntax will have less expression power until nova models NUMA in placement. So nova will try to evenly distribute PCPUs between numa nodes. If it not possible we reject the request and ask the user to use the hw:pinvcpus=3 syntax.
Realtime mask is an exclusion mask, any vcpus not listed there has to be in the dedicated set of the instance.
TODOInvestigate whether we want to enable NUMA by default * Pros: Simpler, everything is NUMA by default * Cons: We'll either have to break/make configurablethe 1:1 guest:host NUMA mapping else we won't be able to boot e.g. a 40 core shared instance on a 40 core, 2 NUMA node host
For the case of 'booting a 40 core shared instance on 40 core 2NUMA node' that will not be covered by the new 'mixed' policy. It is just a legacy 'shared' instance with no assumption about instance NUMA topology.
Correct. However, this investigation refers to *all* instances, not just those using the 'mixed' policy. For the 'mixed' policy, I assume we'll need to apply a virtual NUMA topology since we currently apply one for instances using the 'dedicated' policy. yes for consitency i think that would be the correct approch too.
By the way if you want a 'shared' instance, with 40 cores, to be scheduled on a host of 40cores, 2 NUMA nodes, you also need to register all host cores as 'shared' cpus through 'conf.compute.cpu_shared_set'.
For instance with 'mixed' policy, what I want to propose is the instance should demand at least one 'dedicated'(or PCPU) core. Thus, any 'mixed' instance or 'dedicated' instance will not be scheduled one this host due to no PCPU available on this host.
And also, a 'mixed' instance should also demand at least one 'shared' (or VCPU) core. a 'mixed' instance demanding all cores from PCPU resource should be considered as an invalid one. And an instance demanding all cores from PCPU resource is just a legacy 'dedicated' instance, which CPU allocation policy is 'dedicated'.
In conclusion, a instance with the policy of 'mixed' -. demands at least one 'dedicated' cpu and at least one 'shared' cpu. -. with NUMA topology by default due to requesting pinned cpu
In my understanding the cons does not exist by making above rules.
Br Huaqiang
Cheers, gibi
participants (4)
-
Balázs Gibizer
-
Sean Mooney
-
Stephen Finucane
-
Wang, Huaqiang