That is a bit strange!

When you say that you only see this when using AZ, is there any issues when you don't specify the AZ and simply pick the default one?

Any other logs with  Unable to create allocation for 'VCPU' on resource provider?

On Tue, Jun 29, 2021 at 4:47 PM Jeffrey Mazzone <jmazzone@uchicago.edu> wrote:
Hello, 

I am installing Openstack Ussuri and am running into an issue when using Availability Zones. I initially thought it was a quota issue but that no longer seems to be the case. I started a thread on serverfault and was recommended to submit these questions here as well. Here is the original link: 

https://serverfault.com/questions/1064579/openstack-only-building-one-vm-per-machine-in-cluster-then-runs-out-of-resource 

The issue is still, I can successfully build vms on every host, but only one vm per host. The size of the initial vm does not matter. Since I posted the thread above, I have redeployed the entire cluster, by hand, using the docs on openstack.org. Everything worked as it should, I created 3 test aggregates,  3 test availability zones, with no issues for about a month. 

All of a sudden, the system reverted to no longer allowing more than one machine to be placed per host. There has been no changes to the controller. I have enabled placement logging now so I can see more information but I don’t understand why its happening. 

Example. Start with a host that has no vms on it: 

~# openstack resource provider usage show 3f9d0deb-936c-474a-bdee-d3df049f073d
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU | 0 |
| MEMORY_MB | 0 |
| DISK_GB | 0 |
+----------------+-------+

Create 1 vm with 4 cores

~# openstack resource provider usage show 3f9d0deb-936c-474a-bdee-d3df049f073d
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU | 4 |
| MEMORY_MB | 0 |
| DISK_GB | 0 |
+----------------+-------+

The inventory list for that provider is: 
~# openstack resource provider inventory list 3f9d0deb-936c-474a-bdee-d3df049f073d
+----------------+------------------+----------+----------+----------+-----------+--------+
| resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total |
+----------------+------------------+----------+----------+----------+-----------+--------+
| VCPU | 16.0 | 1 | 64 | 0 | 1 | 64 |
| MEMORY_MB | 1.5 | 1 | 515655 | 512 | 1 | 515655 |
| DISK_GB | 1.0 | 1 | 7096 | 0 | 1 | 7096 |
+----------------+------------------+----------+----------+----------+-----------+--------+

Trying to  start another vm on that host fails with the following log entries: 

scheduler.log

"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'VCPU' on resource provider

conductor.log

Failed to schedule instances: nova.exception_Remote.NoValidHost_Remote: No valid host was found. There are not enough hosts available.

placement.log

Over capacity for VCPU on resource provider 3f9d0deb-936c-474a-bdee-d3df049f073d. Needed: 4, Used: 8206, Capacity: 1024.0

As you can see, the used value is suddenly 8206 after a single 4 core vm is placed on it. I don’t understand what im missing or could be doing wrong. Im really unsure where this value is being calculated from. All the entries in the database and via openstack commands show the correct values except in this log entry. Has anyone experienced the same or similar behavior? I would appreciate any insight as to what the issue could be. 

Thanks in advance!

-Jeff M