CPU pinning blues

31 Oct 2019

      I'm following this document to setup CPU pinning on Rocky:

https://www.redhat.com/en/blog/driving-fast-lane-cpu-pinning-and-numa-topolo...

I followed all of the steps except for modifying non-pinned flavors and I have one aggregate containing a single NUMA-capable host:

root@us01odc-dev1-ctrl1:/var/log/nova# os aggregate list
+----+-------+-------------------+
| ID | Name  | Availability Zone |
+----+-------+-------------------+
|  4 | perf3 | None              |
+----+-------+-------------------+
root@us01odc-dev1-ctrl1:/var/log/nova# os aggregate show 4
+-------------------+----------------------------+
| Field             | Value                      |
+-------------------+----------------------------+
| availability_zone | None                       |
| created_at        | 2019-10-30T23:05:41.000000 |
| deleted           | False                      |
| deleted_at        | None                       |
| hosts             | [u'us01odc-dev1-hv003']    |
| id                | 4                          |
| name              | perf3                      |
| properties        | pinned='true'              |
| updated_at        | None                       |
+-------------------+----------------------------+

I have a flavor with the NUMA properties:

root@us01odc-dev1-ctrl1:/var/log/nova# os flavor show s1.perf3
+----------------------------+-------------------------------------------------------------------------+
| Field                      | Value                                                                   |
+----------------------------+-------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                   |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                       |
| access_project_ids         | None                                                                    |
| disk                       | 35                                                                      |
| id                         | be3d21c4-7e91-42a2-b832-47f42fdd3907                                    |
| name                       | s1.perf3                                                                |
| os-flavor-access:is_public | True                                                                    |
| properties                 | aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' |
| ram                        | 30720                                                                   |
| rxtx_factor                | 1.0                                                                     |
| swap                       | 7168                                                                    |
| vcpus                      | 4                                                                       |
+----------------------------+-------------------------------------------------------------------------+

I create a VM with that flavor:

openstack server create --flavor s1.perf3 --image NOT-QSC-CentOS6.10-19P1-v4 --network it-network alberttest4

but it goes to error status, and I see this in the logs:

2019-10-30 16:17:55.590 3248800 INFO nova.virt.hardware [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Computed NUMA topology CPU pinning: usable pCPUs: [[4], [5], [6], [7]], vCPUs mapping: [(0, 4), (1, 5), (2, 6), (3, 7)]
2019-10-30 16:17:55.595 3248800 INFO nova.virt.hardware [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1], [2], [3], [4], [5], [6], [7]], vCPUs mapping: [(0, 0), (1, 1), (2, 2), (3, 3)]
2019-10-30 16:17:55.595 3248800 INFO nova.filters [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filter AggregateInstanceExtraSpecsFilter returned 0 hosts
2019-10-30 16:17:55.596 3248800 INFO nova.filters [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filtering removed all hosts for the request with instance ID '73b1e584-0ce4-478c-a706-c5892609dc3f'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'CoreFilter: (start: 3, end: 2)', 'RamFilter: (start: 2, end: 2)', 'ComputeFilter: (start: 2, end: 2)', 'ComputeCapabilitiesFilter: (start: 2, end: 2)', 'ImagePropertiesFilter: (start: 2, end: 2)', 'ServerGroupAntiAffinityFilter: (start: 2, end: 2)', 'ServerGroupAffinityFilter: (start: 2, end: 2)', 'DifferentHostFilter: (start: 2, end: 2)', 'SameHostFilter: (start: 2, end: 2)', 'NUMATopologyFilter: (start: 2, end: 2)', 'AggregateInstanceExtraSpecsFilter: (start: 2, end: 0)']

It looks like my hypervisor is passing the hw:cpu_policy='dedicated' requirement but it is failing on "pinned=true"

The interesting part of the problem is that if I add a second apparently identical hypervisor to the aggregate it starts working. I create s1.perf3 VMs and they land on us01odc-dev1-hv002 and the XML shows that they are correctly pinned. When us01odc-dev1-hv002 is full then they start failing again.

What should I be looking for here? What could cause one apparently identical hypervisor to fail AggregateInstanceExtraSpecsFilter while another one passes?

In the nova-compute log of the failing hypervisor I see this:

2019-10-31 10:43:01.147 1103 INFO nova.compute.resource_tracker [req-dda65a9c-9d0a-4888-b4cb-0bf4423dd2f3 - - - - -] Instance 4856d505-c220-4873-b881-836b5b75f7bb has allocations against this compute host but is not found in the database.
2019-10-31 10:43:01.148 1103 INFO nova.compute.resource_tracker [req-dda65a9c-9d0a-4888-b4cb-0bf4423dd2f3 - - - - -] Final resource view: name=us01odc-dev1-hv003.internal.synopsys.com phys_ram=128888MB used_ram=38912MB phys_disk=1208GB used_disk=297GB total_vcpus=8 used_vcpus=6 pci_stats=[]

Openstack can't find a VM with UUID 4856d505-c220-4873-b881-836b5b75f7bb. There are no VMs on hv003 but I can create a non-pinned VM there and it works. Do I have a "phantom" VM that is consuming resources on hv003? How can I fix that?

Albert Braden

Albert Braden

Matt Riedemann

Albert Braden

Albert Braden

Matt Riedemann

tags

participants (2)