RE: All VMs fail when --max exceeds available resources

21 Nov 2019

      The expected result (that I was seeing last week) is that, if my cluster has capacity for 4 VMs and I use --max 5, 4 will go active and 1 will go to error. This week all 5 are going to error. I can still build 4 VMs of that flavor, one at a time, or use --max 4, but if I use --max 5, then all 5 will fail. If I use smaller VMs, the --max numbers get bigger but I still see the same symptom.

The --max thing is pretty useful and we use it a lot; it allows us to use up the cluster without knowing exactly how much space we have.

-----Original Message-----
From: Matt Riedemann <mriedemos@gmail.com> 
Sent: Wednesday, November 20, 2019 2:00 PM
To: openstack-discuss@lists.openstack.org
Subject: Re: All VMs fail when --max exceeds available resources

On 11/20/2019 3:21 PM, Albert Braden wrote:
...
I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure:
root@us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8
+----------------+------------------+----------+----------+-----------+----------+--------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit |  total |
+----------------+------------------+----------+----------+-----------+----------+--------+
| VCPU           |              1.0 |       16 |        2 |         1 |        1 |     16 |
| MEMORY_MB      |              1.0 |   128888 |     8192 |         1 |        1 | 128888 |
| DISK_GB        |              1.0 |     1208 |      246 |         1 |        1 |   1208 |
+----------------+------------------+----------+----------+-----------+----------+--------+
Yup, the config on the controller doesn't apply to the computes or 
placement because the computes are what report the inventory to 
placement so you have to configure the allocation ratios there, or 
starting in stein via (resource provider) aggregate.
...
This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that?
That's something else yeah? I didn't quite dig into that email and the 
allocation ratio thing popped up to me since it's been a long standing 
known painful issue/behavior change since Ocata.

One question though, I read your original email as essentially "(1) I 
did x and got some failures, then (2) I changed something and now 
everything fails", but are you running from a clean environment in both 
test scenarios because if you have VMs on the computes when you're doing 
(2) then that's going to change the scheduling results in (2), i.e. the 
computes will have less capacity since there are resources allocated on 
them in placement.

-- 

Thanks,

Matt