[nova][placement] Openstack only building one VM per machine in cluster, then runs out of resources

Jeffrey Mazzone jmazzone at uchicago.edu
Thu Jul 1 01:13:09 UTC 2021


On Jun 30, 2021, at 5:06 PM, melanie witt <melwittt at gmail.com<mailto:melwittt at gmail.com>> wrote:

I suggest you run the 'openstack resource provider show <RP UUID> --allocations' command as Balazs mentioned earlier to show all of the allocations (used resources) on the compute node. I also suggest you run the 'nova-manage placement audit' tool [1] as Sylvain mentioned earlier to show whether there are any orphaned allocations, i.e. allocations that are for instances that no longer exist. The consumer UUID is the instance UUID.

I did both of those suggestions. "openstack resource provider show <RP UUID> —allocations" shows what is expected. No additional orphaned vms and the resources used is correct. Here is an example of a different set of hosts and zones. This host had 2x 16 core vms on it before the cluster went into this state. You can see them both below. The nova-manage audit commands do not show any orphans either.


~# openstack resource  provider show 41ecee2a-ec24-48e5-8b9d-24065d67238a --allocations
+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                | Value                                                                                                                                                                                                                                                                |
+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid                 | 41ecee2a-ec24-48e5-8b9d-24065d67238a                                                                                                                                                                                                                                 |
| name                 | kh09-56                                                                                                                                                                                                                                                              |
| generation           | 55                                                                                                                                                                                                                                                                   |
| root_provider_uuid   | 41ecee2a-ec24-48e5-8b9d-24065d67238a                                                                                                                                                                                                                                 |
| parent_provider_uuid | None                                                                                                                                                                                                                                                                 |
| allocations          | {'d6b9d19c-1ba9-44c2-97ab-90098509b872': {'resources': {'DISK_GB': 50, 'MEMORY_MB': 16384, 'VCPU': 16}, 'consumer_generation': 1}, 'e0a8401a-0bb6-4612-a496-6a794ebe6cd0': {'resources': {'DISK_GB': 50, 'MEMORY_MB': 16384, 'VCPU': 16}, 'consumer_generation': 1}} |
+----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Usage on the resource provider:

~# openstack resource  provider usage show 41ecee2a-ec24-48e5-8b9d-24065d67238a
+----------------+-------+
| resource_class | usage |
+----------------+-------+
| VCPU           |    32 |
| MEMORY_MB      | 32768 |
| DISK_GB        |   100 |
+----------------+-------+

All of that looks correct. Requesting it to check allocations for a 4 VCPU vm also shows it as a candidate:

~# openstack allocation candidate list --resource VCPU=4 | grep 41ecee2a-ec24-48e5-8b9d-24065d67238a
|  41 | VCPU=4     | 41ecee2a-ec24-48e5-8b9d-24065d67238a | VCPU=32/1024,MEMORY_MB=32768/772714,DISK_GB=100/7096

In the placement database, under the used column, also shows the correct values for the information provided above with those 2 vms on it:

+---------------------+------------+-------+----------------------+--------------------------------------+-------------------+-------+
| created_at          | updated_at | id    | resource_provider_id | consumer_id                          | resource_class_id | used  |
+---------------------+------------+-------+----------------------+--------------------------------------+-------------------+-------+
| 2021-06-02 18:45:05 | NULL       |  4060 |                  125 | e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 2 |    50 |
| 2021-06-02 18:45:05 | NULL       |  4061 |                  125 | e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 1 | 16384 |
| 2021-06-02 18:45:05 | NULL       |  4062 |                  125 | e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 0 |    16 |
| 2021-06-04 18:39:13 | NULL       |  7654 |                  125 | d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 2 |    50 |
| 2021-06-04 18:39:13 | NULL       |  7655 |                  125 | d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 1 | 16384 |
| 2021-06-04 18:39:13 | NULL       |  7656 |                  125 | d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 0 |    16 |


Trying to build a vm though.. I get the placement error with the improperly calculated “Used” values.


2021-06-30 19:51:39.732 43832 WARNING placement.objects.allocation [req-de225c66-8297-4b34-9380-26cf9385d658 a770bde56c9d49e68facb792cf69088c 6da06417e0004cbb87c1e64fe1978de5 - default default] Over capacity for VCPU on resource provider b749130c-a368-4332-8a1f-8411851b4b2a. Needed: 4, Used: 18509, Capacity: 1024.0

Outside of changing the allocation ratio, im completely lost. Im confident it has to do with that improper calculation of the used value but how is it being calculated if it isn’t being added up from fixed values in the database as has been suggested?

Thanks in advance!
-Jeff M





The tl;dr on how the value is calculated is there's a table called 'allocations' in the placement database that holds all the values for resource providers and resource classes and it has a 'used' column. If you add up all of the 'used' values for a resource class (VCPU) and resource provider (compute node) then that will be the total used of that resource on that resource provider. You can see this data by 'openstack resource provider show <RP UUID> --allocations' as well.

The allocation ratio will not affect the value of 'used' but it will affect the working value of 'total' to be considered higher than it actually is in order to oversubscribe. If a compute node has 64 cores and cpu_allocation ratio is 16 then 64 * 16 = 1024 cores will be allowed for placement on that compute node.

You likely have "orphaned" allocations for the compute node/resource provider that are not mapped to instances any more and you can use 'nova-manage placement audit' to find those and optionally delete them. Doing that will cleanup your resource provider. First, I would run it without specifying --delete just to see what it shows without modifying anything.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210701/d8e57e77/attachment-0001.html>


More information about the openstack-discuss mailing list