All VMs fail when --max exceeds available resources

Albert Braden Albert.Braden at synopsys.com
Wed Nov 20 21:21:25 UTC 2019


Yes, we are on Rocky. If I'm reading correctly the document says that setting allocation ratios by aggregate may not work after Ocata, but we are setting them in nova.conf on the controller. That setting does appear to have failed. The settings are 1:

root at us01odc-dev1-ctrl1:~# grep allocation_ /etc/nova/nova.conf
cpu_allocation_ratio = 1
ram_allocation_ratio = 1.0

But the inventory shows different values:

root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8
+----------------+------------------+----------+----------+-----------+----------+--------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit |  total |
+----------------+------------------+----------+----------+-----------+----------+--------+
| VCPU           |             16.0 |       16 |        2 |         1 |        1 |     16 |
| MEMORY_MB      |              1.5 |   128888 |     8192 |         1 |        1 | 128888 |
| DISK_GB        |              1.0 |     1208 |      246 |         1 |        1 |   1208 |
+----------------+------------------+----------+----------+-----------+----------+--------+

I think the document is saying that we need to set them in nova.conf on each HV. I tried that and it seems to fix the allocation failure:

root at us01odc-dev1-ctrl1:~# os resource provider inventory list f20fa03d-18f4-486b-9b40-ceaaf52dabf8
+----------------+------------------+----------+----------+-----------+----------+--------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit |  total |
+----------------+------------------+----------+----------+-----------+----------+--------+
| VCPU           |              1.0 |       16 |        2 |         1 |        1 |     16 |
| MEMORY_MB      |              1.0 |   128888 |     8192 |         1 |        1 | 128888 |
| DISK_GB        |              1.0 |     1208 |      246 |         1 |        1 |   1208 |
+----------------+------------------+----------+----------+-----------+----------+--------+

This fixed the "allocation ratio" issue but I still see the --max issue. What could be causing that?

-----Original Message-----
From: Matt Riedemann <mriedemos at gmail.com> 
Sent: Wednesday, November 20, 2019 12:10 PM
To: openstack-discuss at lists.openstack.org
Subject: Re: All VMs fail when --max exceeds available resources

On 11/20/2019 1:02 PM, Albert Braden wrote:
> The other symptom is that the scheduler will send single VMs to a full 
> hypervisor and overload it even though we have cpu_allocation_ratio and 
> ram_allocation_ratio set to 1:

You're on Rocky correct? If allocation ratios are acting funky, you 
should read through this:

https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_nova_rocky_admin_configuration_schedulers.html-23bug-2D1804125&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=gMZHbn10OjlP7p38T4nIjXKKFJRwV8b1vbxdP_PSGlg&s=hLHu3N1jilahIhKN7TpazkEVjFSQX-YwtVvTos-h9BY&e= 

There were some changes in Stein to help with configuring nova to deal 
with allocation ratios per compute or via aggregate:

https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_nova_latest_admin_configuration_schedulers.html-23allocation-2Dratios&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=gMZHbn10OjlP7p38T4nIjXKKFJRwV8b1vbxdP_PSGlg&s=Gb4j3hz2t9M_BDmhIMvg2BQiXcg5CEYAdMlj-PFZygQ&e= 

But what you'll likely need to do is manage the allocation ratios in 
aggregate on the resource providers in placement. Fortunately there is a 
CLI for doing that:

https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_osc-2Dplacement_latest_cli_index.html-23resource-2Dprovider-2Dinventory-2Dset&d=DwIC-g&c=DPL6_X_6JkXFx7AXWqB0tg&r=XrJBXYlVPpvOXkMqGPz6KucRW_ils95ZMrEmlTflPm8&m=gMZHbn10OjlP7p38T4nIjXKKFJRwV8b1vbxdP_PSGlg&s=tsaZIozBHkiBjNvbZGRvyuRMQOKe23zN2ouP3uOi8ag&e= 

e.g. openstack resource provider inventory set <aggregate_uuid> 
--resource VCPU:allocation_ratio=1.0 --aggregate --amend

Anyway, see if that documented bug with allocation ratios is your issue 
first and then go through the workarounds.

-- 

Thanks,

Matt




More information about the openstack-discuss mailing list