[nova][placement] Openstack only building one VM per machine in cluster, then runs out of resources

Balazs Gibizer balazs.gibizer at est.tech
Thu Jul 1 07:41:59 UTC 2021



On Thu, Jul 1, 2021 at 01:13, Jeffrey Mazzone <jmazzone at uchicago.edu> 
wrote:
>> On Jun 30, 2021, at 5:06 PM, melanie witt <melwittt at gmail.com> wrote:
>> 
> I suggest you run the 'openstack resource provider show <RP UUID> 
> --allocations' command as Balazs mentioned earlier to show all of the 
> allocations (used resources) on the compute node. I also suggest you 
> run the 'nova-manage placement audit' tool [1] as Sylvain mentioned 
> earlier to show whether there are any orphaned allocations, i.e. 
> allocations that are for instances that no longer exist. The consumer 
> UUID is the instance UUID.
> 
> I did both of those suggestions. "openstack resource provider show 
> <RP UUID> —allocations" shows what is expected. No additional 
> orphaned vms and the resources used is correct. Here is an example of 
> a different set of hosts and zones. This host had 2x 16 core vms on 
> it before the cluster went into this state. You can see them both 
> below. The nova-manage audit commands do not show any orphans either.
> 
> ~# openstack resource  provider show 
> 41ecee2a-ec24-48e5-8b9d-24065d67238a --allocations
> +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | Field                | Value                                        
>                                                                       
>                                                                       
>                                                                       
>       |
> +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | uuid                 | 41ecee2a-ec24-48e5-8b9d-24065d67238a         
>                                                                       
>                                                                       
>                                                                       
>       |
> | name                 | kh09-56                                      
>                                                                       
>                                                                       
>                                                                       
>       |
> | generation           | 55                                           
>                                                                       
>                                                                       
>                                                                       
>       |
> | root_provider_uuid   | 41ecee2a-ec24-48e5-8b9d-24065d67238a         
>                                                                       
>                                                                       
>                                                                       
>       |
> | parent_provider_uuid | None                                         
>                                                                       
>                                                                       
>                                                                       
>       |
> | allocations          | {'d6b9d19c-1ba9-44c2-97ab-90098509b872': 
> {'resources': {'DISK_GB': 50, 'MEMORY_MB': 16384, 'VCPU': 16}, 
> 'consumer_generation': 1}, 'e0a8401a-0bb6-4612-a496-6a794ebe6cd0': 
> {'resources': {'DISK_GB': 50, 'MEMORY_MB': 16384, 'VCPU': 16}, 
> 'consumer_generation': 1}} |
> +----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 
> Usage on the resource provider:
> ~# openstack resource  provider usage show 
> 41ecee2a-ec24-48e5-8b9d-24065d67238a
> +----------------+-------+
> | resource_class | usage |
> +----------------+-------+
> | VCPU           |    32 |
> | MEMORY_MB      | 32768 |
> | DISK_GB        |   100 |
> +----------------+-------+
> 
> All of that looks correct. Requesting it to check allocations for a 4 
> VCPU vm also shows it as a candidate:
> ~# openstack allocation candidate list --resource VCPU=4 | grep 
> 41ecee2a-ec24-48e5-8b9d-24065d67238a
> |  41 | VCPU=4     | 41ecee2a-ec24-48e5-8b9d-24065d67238a | 
> VCPU=32/1024,MEMORY_MB=32768/772714,DISK_GB=100/7096
> 
> In the placement database, under the used column, also shows the 
> correct values for the information provided above with those 2 vms on 
> it:
> +---------------------+------------+-------+----------------------+--------------------------------------+-------------------+-------+
> | created_at          | updated_at | id    | resource_provider_id | 
> consumer_id                          | resource_class_id | used  |
> +---------------------+------------+-------+----------------------+--------------------------------------+-------------------+-------+
> | 2021-06-02 18:45:05 | NULL       |  4060 |                  125 | 
> e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 2 |    50 |
> | 2021-06-02 18:45:05 | NULL       |  4061 |                  125 | 
> e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 1 | 16384 |
> | 2021-06-02 18:45:05 | NULL       |  4062 |                  125 | 
> e0a8401a-0bb6-4612-a496-6a794ebe6cd0 |                 0 |    16 |
> | 2021-06-04 18:39:13 | NULL       |  7654 |                  125 | 
> d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 2 |    50 |
> | 2021-06-04 18:39:13 | NULL       |  7655 |                  125 | 
> d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 1 | 16384 |
> | 2021-06-04 18:39:13 | NULL       |  7656 |                  125 | 
> d6b9d19c-1ba9-44c2-97ab-90098509b872 |                 0 |    16 |
> 
> 
> Trying to build a vm though.. I get the placement error with the 
> improperly calculated “Used” values.
> 
> 2021-06-30 19:51:39.732 43832 WARNING placement.objects.allocation 
> [req-de225c66-8297-4b34-9380-26cf9385d658 
> a770bde56c9d49e68facb792cf69088c 6da06417e0004cbb87c1e64fe1978de5 - 
> default default] Over capacity for VCPU on resource provider 
> b749130c-a368-4332-8a1f-8411851b4b2a. Needed: 4, Used: 18509, 
> Capacity: 1024.0
> 

Again you confirmed that the compute RP 
41ecee2a-ec24-48e5-8b9d-24065d67238a has a consistent resource view but 
placement warns about another compute 
b749130c-a368-4332-8a1f-8411851b4b2a.

Could you try to trace through one single situation?

Try to boot a VM that results in the error with the placement over 
capacity warning. Then collect the resource view of the compute RP the 
placement warning points at.

If the result of such tracing is not showing the reason then you can 
dig the placement code. The placement warning comes from 
https://github.com/openstack/placement/blob/f77a7f9928d1156450c48045c48597b2feec9cc1/placement/objects/allocation.py#L228 
top of that function there is an SQL command you can try to apply to 
your DB and the resource provider placement warns about to see where 
the used value are coming from.

Cheers,
gibi

> Outside of changing the allocation ratio, im completely lost. Im 
> confident it has to do with that improper calculation of the used 
> value but how is it being calculated if it isn’t being added up 
> from fixed values in the database as has been suggested?
> 
> Thanks in advance!
> -Jeff M
> 
> 
> 
>> 
>> 
>> The tl;dr on how the value is calculated is there's a table called 
>> 'allocations' in the placement database that holds all the values 
>> for resource providers and resource classes and it has a 'used' 
>> column. If you add up all of the 'used' values for a resource class 
>> (VCPU) and resource provider (compute node) then that will be the 
>> total used of that resource on that resource provider. You can see 
>> this data by 'openstack resource provider show <RP UUID> 
>> --allocations' as well.
>> 
>> The allocation ratio will not affect the value of 'used' but it will 
>> affect the working value of 'total' to be considered higher than it 
>> actually is in order to oversubscribe. If a compute node has 64 
>> cores and cpu_allocation ratio is 16 then 64 * 16 = 1024 cores will 
>> be allowed for placement on that compute node.
>> 
>> You likely have "orphaned" allocations for the compute node/resource 
>> provider that are not mapped to instances any more and you can use 
>> 'nova-manage placement audit' to find those and optionally delete 
>> them. Doing that will cleanup your resource provider. First, I would 
>> run it without specifying --delete just to see what it shows without 
>> modifying anything.
> 





More information about the openstack-discuss mailing list