CPU pinning blues
Albert Braden
Albert.Braden at synopsys.com
Thu Oct 31 17:49:54 UTC 2019
I'm following this document to setup CPU pinning on Rocky:
https://www.redhat.com/en/blog/driving-fast-lane-cpu-pinning-and-numa-topology-awareness-openstack-compute
I followed all of the steps except for modifying non-pinned flavors and I have one aggregate containing a single NUMA-capable host:
root at us01odc-dev1-ctrl1:/var/log/nova# os aggregate list
+----+-------+-------------------+
| ID | Name | Availability Zone |
+----+-------+-------------------+
| 4 | perf3 | None |
+----+-------+-------------------+
root at us01odc-dev1-ctrl1:/var/log/nova# os aggregate show 4
+-------------------+----------------------------+
| Field | Value |
+-------------------+----------------------------+
| availability_zone | None |
| created_at | 2019-10-30T23:05:41.000000 |
| deleted | False |
| deleted_at | None |
| hosts | [u'us01odc-dev1-hv003'] |
| id | 4 |
| name | perf3 |
| properties | pinned='true' |
| updated_at | None |
+-------------------+----------------------------+
I have a flavor with the NUMA properties:
root at us01odc-dev1-ctrl1:/var/log/nova# os flavor show s1.perf3
+----------------------------+-------------------------------------------------------------------------+
| Field | Value |
+----------------------------+-------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 35 |
| id | be3d21c4-7e91-42a2-b832-47f42fdd3907 |
| name | s1.perf3 |
| os-flavor-access:is_public | True |
| properties | aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' |
| ram | 30720 |
| rxtx_factor | 1.0 |
| swap | 7168 |
| vcpus | 4 |
+----------------------------+-------------------------------------------------------------------------+
I create a VM with that flavor:
openstack server create --flavor s1.perf3 --image NOT-QSC-CentOS6.10-19P1-v4 --network it-network alberttest4
but it goes to error status, and I see this in the logs:
2019-10-30 16:17:55.590 3248800 INFO nova.virt.hardware [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Computed NUMA topology CPU pinning: usable pCPUs: [[4], [5], [6], [7]], vCPUs mapping: [(0, 4), (1, 5), (2, 6), (3, 7)]
2019-10-30 16:17:55.595 3248800 INFO nova.virt.hardware [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Computed NUMA topology CPU pinning: usable pCPUs: [[0], [1], [2], [3], [4], [5], [6], [7]], vCPUs mapping: [(0, 0), (1, 1), (2, 2), (3, 3)]
2019-10-30 16:17:55.595 3248800 INFO nova.filters [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filter AggregateInstanceExtraSpecsFilter returned 0 hosts
2019-10-30 16:17:55.596 3248800 INFO nova.filters [req-d0c2de13-db23-41bd-8da3-34c68ff1d998 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filtering removed all hosts for the request with instance ID '73b1e584-0ce4-478c-a706-c5892609dc3f'. Filter results: ['RetryFilter: (start: 3, end: 3)', 'AvailabilityZoneFilter: (start: 3, end: 3)', 'CoreFilter: (start: 3, end: 2)', 'RamFilter: (start: 2, end: 2)', 'ComputeFilter: (start: 2, end: 2)', 'ComputeCapabilitiesFilter: (start: 2, end: 2)', 'ImagePropertiesFilter: (start: 2, end: 2)', 'ServerGroupAntiAffinityFilter: (start: 2, end: 2)', 'ServerGroupAffinityFilter: (start: 2, end: 2)', 'DifferentHostFilter: (start: 2, end: 2)', 'SameHostFilter: (start: 2, end: 2)', 'NUMATopologyFilter: (start: 2, end: 2)', 'AggregateInstanceExtraSpecsFilter: (start: 2, end: 0)']
It looks like my hypervisor is passing the hw:cpu_policy='dedicated' requirement but it is failing on "pinned=true"
The interesting part of the problem is that if I add a second apparently identical hypervisor to the aggregate it starts working. I create s1.perf3 VMs and they land on us01odc-dev1-hv002 and the XML shows that they are correctly pinned. When us01odc-dev1-hv002 is full then they start failing again.
What should I be looking for here? What could cause one apparently identical hypervisor to fail AggregateInstanceExtraSpecsFilter while another one passes?
In the nova-compute log of the failing hypervisor I see this:
2019-10-31 10:43:01.147 1103 INFO nova.compute.resource_tracker [req-dda65a9c-9d0a-4888-b4cb-0bf4423dd2f3 - - - - -] Instance 4856d505-c220-4873-b881-836b5b75f7bb has allocations against this compute host but is not found in the database.
2019-10-31 10:43:01.148 1103 INFO nova.compute.resource_tracker [req-dda65a9c-9d0a-4888-b4cb-0bf4423dd2f3 - - - - -] Final resource view: name=us01odc-dev1-hv003.internal.synopsys.com phys_ram=128888MB used_ram=38912MB phys_disk=1208GB used_disk=297GB total_vcpus=8 used_vcpus=6 pci_stats=[]
Openstack can't find a VM with UUID 4856d505-c220-4873-b881-836b5b75f7bb. There are no VMs on hv003 but I can create a non-pinned VM there and it works. Do I have a "phantom" VM that is consuming resources on hv003? How can I fix that?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191031/12225fc9/attachment-0001.html>
More information about the openstack-discuss
mailing list