[nova][placement] Openstack only building one VM per machine in cluster, then runs out of resources

Balazs Gibizer balazs.gibizer at est.tech
Thu Jul 1 07:30:31 UTC 2021



On Wed, Jun 30, 2021 at 21:06, Jeffrey Mazzone <jmazzone at uchicago.edu> 
wrote:
>  Yes, this is almost exactly what I did. No, I am not running mysql 
> in a HA deployment and I have ran nova-manage api_db sync several 
> times throughout the process below.
> 
>  I think I found a work around but im not sure how feasible this is.
> 
> I first, changed the reallocation ratio to 1:1. In the nova.conf on 
> the controller. Nova would not accept this for some reason and seemed 
> like it needed to be changed on the compute node. So I deleted the 
> hypervisor, resource provider, and compute service. Changed the 
> ratios on the compute node itself, and then re-added it back in. Now 
> the capacity changed to 64 which is the number of cores on the 
> systems. When starting a vm, it still gets the same number for 
> “used” in the placement-api.log: See below:
> 
> New ratios
> ~# openstack resource provider inventory list 
> 554f2a3b-924e-440c-9847-596064ea0f3f
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | resource_class | allocation_ratio | min_unit | max_unit | reserved 
> | step_size |  total |
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | VCPU           |              1.0 |        1 |       64 |        0 
> |         1 |     64 |
> | MEMORY_MB      |              1.0 |        1 |   515655 |      512 
> |         1 | 515655 |
> | DISK_GB        |              1.0 |        1 |     7096 |        0 
> |         1 |   7096 |
> +----------------+------------------+----------+----------+----------+-----------+--------+
> 
> Error from placement.log
> 2021-06-30 13:49:24.877 4381 WARNING placement.objects.allocation 
> [req-7dc8930f-1eac-401a-ade7-af36e64c2ba8 
> a770bde56c9d49e68facb792cf69088c 6da06417e0004cbb87c1e64fe1978de5 - 
> default default] Over capacity for VCPU on resource provider 
> c4199e84-8259-4d0e-9361-9b0d9e6e66b7. Needed: 4, Used: 8206, 
> Capacity: 64.0
> 
> With that in mind, I did the same procedure again but set the ratio 
> to 1024
> 
> New ratios
> ~# openstack resource provider inventory list 
> 519c1e10-3546-4e3b-a017-3e831376cde8
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | resource_class | allocation_ratio | min_unit | max_unit | reserved 
> | step_size |  total |
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | VCPU           |           1024.0 |        1 |       64 |        0 
> |         1 |     64 |
> | MEMORY_MB      |              1.0 |        1 |   515655 |      512 
> |         1 | 515655 |
> | DISK_GB        |              1.0 |        1 |     7096 |        0 
> |         1 |   7096 |
> +----------------+------------------+----------+----------+----------+-----------+--------+
> 

Your are collecting data from the compute RP 
519c1e10-3546-4e3b-a017-3e831376cde8 but placement warns about another 
compute RP c4199e84-8259-4d0e-9361-9b0d9e6e66b7.

> 
> Now I can spin up vms without issues.
> 
> I have 1 test AZ with 2 hosts inside. I have set these hosts to the 
> ratio above. I was able to spin up approx 45 4x core VMs without 
> issues and no signs of it hitting an upper limit on the host.
> 
> 120 | VCPU=64    | 519c1e10-3546-4e3b-a017-3e831376cde8 | 
> VCPU=88/65536
> 23 | VCPU=64    | 8f97a3ba-98a0-475e-a3cf-41425569b2cb | VCPU=96/65536
> 
> 
> I have 2 problems with this fix.
> 
> 1) the overcommit is now super high and I have no way, besides 
> quotas, to guarantee the system won’t be over provisioned.
> 2) I still don’t know how that “used” resources value is being 
> calculated. When this issue first started, the “used” resources 
> were a different number. Over the past two days, the used resources 
> for a 4 core virtual machine have remained at 8206 but I have no way 
> to guarantee this.
> 
> My initial tests when this started was to compare the resource values 
> when building different size vms. Here is that list:
> 
> 1 core - 4107
> 2 core - 4108
> 4 core- 4110
> 8 core - 4114
> 16 core - 4122
> 32 core - 8234
> 
> The number on the right is the number the “used” value used to 
> be. Yesterday and today, it has changed to 8206 for a 4 core vm, I 
> have not tested the rest.
> 
> Before I commit to combing through the placement api source code to 
> figure out how the “used” value in the placement log is being 
> calculated, im hoping someone knows where and how that value is being 
> calculated. It does not seem to be a fixed value in the database and 
> it doesn’t seem to be effected by the allocation ratios.
> 
> 
> Thank you in advance!!
> -Jeff Mazzone
>  Senior Linux Systems Administrator
>  Center for Translational Data Science
>  University of Chicago.
> 
> 
> 
>> On Jun 30, 2021, at 2:40 PM, Laurent Dumont 
>> <laurentfdumont at gmail.com> wrote:
>> 
>> In some cases, the DEBUG messages are a bit verbose but can really 
>> walk you through the allocation/scheduling process. You could 
>> increase it for nova and restart the api + scheduler on the 
>> controllers. I wonder if a desync of the DB could be in cause? Are 
>> you running an HA deployment for the mysql backend?
> 





More information about the openstack-discuss mailing list