[nova][placement] Openstack only building one VM per machine in cluster, then runs out of resources

Laurent Dumont laurentfdumont at gmail.com
Thu Jul 1 03:51:10 UTC 2021


I'm curious to see if I can reproduce the issue in my test-env. I never
tried puppet-openstack so might as well see how it goes!

The ServerFault issue mentions the puppet-openstack integration being used
to deploy Ussuri? Specifically, the puppet modules being at the 17.4
version?

But looking at
https://docs.openstack.org/puppet-openstack-guide/latest/install/releases.html
- the modules for Ussuri should be at 16.x? Could it be some kind of weird
setup of the deployment modules for Ussuri/placement that didn't go as
planned?

On Wed, Jun 30, 2021 at 9:13 PM Jeffrey Mazzone <jmazzone at uchicago.edu>
wrote:

> On Jun 30, 2021, at 5:06 PM, melanie witt <melwittt at gmail.com> wrote:
>
> I suggest you run the 'openstack resource provider show <RP UUID>
> --allocations' command as Balazs mentioned earlier to show all of the
> allocations (used resources) on the compute node. I also suggest you run
> the 'nova-manage placement audit' tool [1] as Sylvain mentioned earlier to
> show whether there are any orphaned allocations, i.e. allocations that are
> for instances that no longer exist. The consumer UUID is the instance UUID.
>
> I did both of those suggestions. "openstack resource provider show <RP
> UUID> —allocations" shows what is expected. No additional orphaned vms and
> the resources used is correct. Here is an example of a different set of
> hosts and zones. This host had 2x 16 core vms on it before the cluster went
> into this state. You can see them both below. The nova-manage audit
> commands do not show any orphans either.
>
> ~# openstack resource  provider show 41ecee2a-ec24-48e5-8b9d-24065d67238a --allocations
> +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Field                | Value                                                                                                                                                                                                                                                                |
> +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | uuid                 | 41ecee2a-ec24-48e5-8b9d-24065d67238a                                                                                                                                                                                                                                 |
> | name                 | kh09-56                                                                                                                                                                                                                                                              |
> | generation           | 55                                                                                                                                                                                                                                                                   |
> | root_provider_uuid   | 41ecee2a-ec24-48e5-8b9d-24065d67238a                                                                                                                                                                                                                                 |
> | parent_provider_uuid | None                                                                                                                                                                                                                                                                 |
> | allocations          | {'d6b9d19c-1ba9-44c2-97ab-90098509b872': {'resources': {'DISK_GB': 50, 'MEMORY_MB': 16384, 'VCPU': 16}, 'consumer_generation': 1}, 'e0a8401a-0bb6-4612-a496-6a794ebe6cd0': {'resources': {'DISK_GB': 50, 'MEMORY_MB': 16384, 'VCPU': 16}, 'consumer_generation': 1}} |
> +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
>
> Usage on the resource provider:
>
> ~# openstack resource  provider usage show 41ecee2a-ec24-48e5-8b9d-24065d67238a
> +----------------+-------+
> | resource_class | usage |
> +----------------+-------+
> | VCPU           |    32 |
> | MEMORY_MB      | 32768 |
> | DISK_GB        |   100 |
> +----------------+-------+
>
>
> All of that looks correct. Requesting it to check allocations for a 4 VCPU
> vm also shows it as a candidate:
>
> ~# openstack allocation candidate list --resource VCPU=4 | grep 41ecee2a-ec24-48e5-8b9d-24065d67238a
> |  41 | VCPU=4     | 41ecee2a-ec24-48e5-8b9d-24065d67238a | VCPU=32/1024,MEMORY_MB=32768/772714,DISK_GB=100/7096
>
>
> In the placement database, under the used column, also shows the correct
> values for the information provided above with those 2 vms on it:
>
> +---------------------+------------+-------+----------------------+--------------------------------------+-------------------+-------+
> | created_at          | updated_at | id    | resource_provider_id | consumer_id                          | resource_class_id | used  |
> +---------------------+------------+-------+----------------------+--------------------------------------+-------------------+-------+
> | 2021-06-02 18:45:05 | NULL       |  4060 |                  125 | e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 2 |    50 |
> | 2021-06-02 18:45:05 | NULL       |  4061 |                  125 | e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 1 | 16384 |
> | 2021-06-02 18:45:05 | NULL       |  4062 |                  125 | e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 0 |    16 |
> | 2021-06-04 18:39:13 | NULL       |  7654 |                  125 | d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 2 |    50 |
> | 2021-06-04 18:39:13 | NULL       |  7655 |                  125 | d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 1 | 16384 |
> | 2021-06-04 18:39:13 | NULL       |  7656 |                  125 | d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 0 |    16 |
>
>
>
> Trying to build a vm though.. I get the placement error with the
> improperly calculated “Used” values.
>
> 2021-06-30 19:51:39.732 43832 WARNING placement.objects.allocation [req-de225c66-8297-4b34-9380-26cf9385d658 a770bde56c9d49e68facb792cf69088c 6da06417e0004cbb87c1e64fe1978de5 - default default] Over capacity for VCPU on resource provider b749130c-a368-4332-8a1f-8411851b4b2a. Needed: 4, Used: 18509, Capacity: 1024.0
>
>
> Outside of changing the allocation ratio, im completely lost. Im confident
> it has to do with that improper calculation of the used value but how is it
> being calculated if it isn’t being added up from fixed values in the
> database as has been suggested?
>
> Thanks in advance!
> -Jeff M
>
>
>
>
>
> The tl;dr on how the value is calculated is there's a table called
> 'allocations' in the placement database that holds all the values for
> resource providers and resource classes and it has a 'used' column. If you
> add up all of the 'used' values for a resource class (VCPU) and resource
> provider (compute node) then that will be the total used of that resource
> on that resource provider. You can see this data by 'openstack resource
> provider show <RP UUID> --allocations' as well.
>
> The allocation ratio will not affect the value of 'used' but it will
> affect the working value of 'total' to be considered higher than it
> actually is in order to oversubscribe. If a compute node has 64 cores and
> cpu_allocation ratio is 16 then 64 * 16 = 1024 cores will be allowed for
> placement on that compute node.
>
> You likely have "orphaned" allocations for the compute node/resource
> provider that are not mapped to instances any more and you can use
> 'nova-manage placement audit' to find those and optionally delete them.
> Doing that will cleanup your resource provider. First, I would run it
> without specifying --delete just to see what it shows without modifying
> anything.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210630/e260f06c/attachment-0001.html>


More information about the openstack-discuss mailing list