I'm following this document to setup CPU pinning on Rocky:
https://www.redhat.com/en/blog/driving-fast-lane-cpu-pinning-and-numa-topolo...
I followed all of the steps except for modifying non-pinned flavors and I have one aggregate containing a single NUMA-capable host:
root@us01odc-dev1-ctrl1:/var/log/nova# os aggregate list +----+-------+-------------------+ | ID | Name | Availability Zone | +----+-------+-------------------+ | 4 | perf3 | None | +----+-------+-------------------+ root@us01odc-dev1-ctrl1:/var/log/nova# os aggregate show 4 +-------------------+----------------------------+ | Field | Value | +-------------------+----------------------------+ | availability_zone | None | | created_at | 2019-10-30T23:05:41.000000 | | deleted | False | | deleted_at | None | | hosts | [u'us01odc-dev1-hv003'] | | id | 4 | | name | perf3 | | properties | pinned='true' | | updated_at | None | +-------------------+----------------------------+
I have a flavor with the NUMA properties:
root@us01odc-dev1-ctrl1:/var/log/nova# os flavor show s1.perf3 +----------------------------+-------------------------------------------------------------------------+ | Field | Value | +----------------------------+-------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 35 | | id | be3d21c4-7e91-42a2-b832-47f42fdd3907 | | name | s1.perf3 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' | | ram | 30720 | | rxtx_factor | 1.0 | | swap | 7168 | | vcpus | 4 | +----------------------------+-------------------------------------------------------------------------+
I create a VM with that flavor:
openstack server create --flavor s1.perf3 --image NOT-QSC-CentOS6.10-19P1-v4 --network it-network alberttest4
but it goes to error status, and I see this in the logs:
2019-10-30 16:17:55.590 3248800 INFO nova.virt.hardware [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Computed NUMA topology CPU pinning: usable pCPUs: [[4], [5], [6], [7]], vCPUs mapping: [(0, 4), (1, 5), (2, 6), (3, 7)] 2019-10-30 16:17:55.595 3248800 INFO nova.virt.hardware [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1], [2], [3], [4], [5], [6], [7]], vCPUs mapping: [(0, 0), (1, 1), (2, 2), (3, 3)] 2019-10-30 16:17:55.595 3248800 INFO nova.filters [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filter AggregateInstanceExtraSpecsFilter returned 0 hosts 2019-10-30 16:17:55.596 3248800 INFO nova.filters [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filtering removed all hosts for the request with instance ID '73b1e584-0ce4-478c-a706-c5892609dc3f'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'CoreFilter: (start: 3, end: 2)', 'RamFilter: (start: 2, end: 2)', 'ComputeFilter: (start: 2, end: 2)', 'ComputeCapabilitiesFilter: (start: 2, end: 2)', 'ImagePropertiesFilter: (start: 2, end: 2)', 'ServerGroupAntiAffinityFilter: (start: 2, end: 2)', 'ServerGroupAffinityFilter: (start: 2, end: 2)', 'DifferentHostFilter: (start: 2, end: 2)', 'SameHostFilter: (start: 2, end: 2)', 'NUMATopologyFilter: (start: 2, end: 2)', 'AggregateInstanceExtraSpecsFilter: (start: 2, end: 0)']
It looks like my hypervisor is passing the hw:cpu_policy='dedicated' requirement but it is failing on "pinned=true"
The interesting part of the problem is that if I add a second apparently identical hypervisor to the aggregate it starts working. I create s1.perf3 VMs and they land on us01odc-dev1-hv002 and the XML shows that they are correctly pinned. When us01odc-dev1-hv002 is full then they start failing again.
What should I be looking for here? What could cause one apparently identical hypervisor to fail AggregateInstanceExtraSpecsFilter while another one passes?
In the nova-compute log of the failing hypervisor I see this:
2019-10-31 10:43:01.147 1103 INFO nova.compute.resource_tracker [req-dda65a9c-9d0a-4888-b4cb-0bf4423dd2f3 - - - - -] Instance 4856d505-c220-4873-b881-836b5b75f7bb has allocations against this compute host but is not found in the database. 2019-10-31 10:43:01.148 1103 INFO nova.compute.resource_tracker [req-dda65a9c-9d0a-4888-b4cb-0bf4423dd2f3 - - - - -] Final resource view: name=us01odc-dev1-hv003.internal.synopsys.com phys_ram=128888MB used_ram=38912MB phys_disk=1208GB used_disk=297GB total_vcpus=8 used_vcpus=6 pci_stats=[]
Openstack can't find a VM with UUID 4856d505-c220-4873-b881-836b5b75f7bb. There are no VMs on hv003 but I can create a non-pinned VM there and it works. Do I have a "phantom" VM that is consuming resources on hv003? How can I fix that?