On Wed, Jun 30, 2021 at 3:40 PM Laurent Dumont <laurentfdumont@gmail.com> wrote:

In some cases, the DEBUG messages are a bit verbose but can really walk you through the allocation/scheduling process. You could increase it for nova and restart the api + scheduler on the controllers. I wonder if a desync of the DB could be in cause? Are you running an HA deployment for the mysql backend?
On Wed, Jun 30, 2021 at 1:44 PM Jeffrey Mazzone <jmazzone@uchicago.edu> wrote:
Any other logs with Unable to create allocation for 'VCPU' on resource provider?

No, the 3 logs listed are the only logs where it is showing this message and VCPU is the only thing it fails for. No memory or disk allocation failures, always VCPU.

At this point if you list the resource provider usage on 3f9d0deb-936c-474a-bdee-d3df049f073d again then do you still see 4 VCPU used, or 8206 used?

The usage shows everything correctly:
~# openstack resource provider usage show 3f9d0deb-936c-474a-bdee-d3df049f073d
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU           |     4 |
| MEMORY_MB      |  8192 |
| DISK_GB        |    10 |
+----------------+-------+
Allocations shows the same:
~# openstack resource provider show 3f9d0deb-936c-474a-bdee-d3df049f073d --allocations
+-------------+--------------------------------------------------------------------------------------------------------+
| Field       | Value                                                                                                  |
+-------------+--------------------------------------------------------------------------------------------------------+
| uuid        | 3f9d0deb-936c-474a-bdee-d3df049f073d                                                                   |
| name        | kh09-50                                                                                                |
| generation  | 244                                                                                                    |
| allocations | {'4a6fe4c2-ece4-45c2-b7a2-fdfd41308988': {'resources': {'VCPU': 4, 'MEMORY_MB': 8192, 'DISK_GB': 10}}} |
+-------------+--------------------------------------------------------------------------------------------------------+
Allocation candidate list shows all 228 servers in the cluster available:
~# openstack allocation candidate list --resource VCPU=4 -c "resource provider" -f value | wc -l
228
Starting a new vm on that host shows the following in the logs:

Placement-api.log
2021-06-30 12:27:21.335 4382 WARNING placement.objects.allocation [req-f4d74abc-7b18-407a-85e7-f1c268bd5e53 a770bde56c9d49e68facb792cf69088c 6da06417e0004cbb87c1e64fe1978de5 - default default] Over capacity for VCPU on resource provider 0e0d8ec8-bb31-4da5-a813-bd73560ff7d6. Needed: 4, Used: 8206, Capacity: 1024.0
nova-scheduler.log
2021-06-30 12:27:21.429 6895 WARNING nova.scheduler.client.report [req-3106f4da-1df9-4370-b56b-8ba6b62980dc aacc7911abf349b783eed20ad176c034 23920ecfbf294e71ad558aa49cb17de8 - default default] Failed to save allocation for a9296e22-4b50-45b7-a442-1fce0a844bcd. Got HTTP 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider '3f9d0deb-936c-474a-bdee-d3df049f073d'. The requested amount would exceed the capacity.  ", "code": "placement.undefined_code", "request_id": "req-e9f12a3a-3136-4501-8bd6-4add31f0eb82"}]}
I really can’t figure out where this, what’s seems to be last minute, calculation of used resources comes from.

Given you also have an Ussuri deployment, you could call the nova-audit command to see whether you would have orphaned allocations :

nova-manage placement audit [--verbose] [--delete] [--resource_provider <uuid>]

When running this command, it says the UUID does not exist.

Thank you! I truly appreciate everyones help.

-Jeff M