[nova][placement] Openstack only building one VM per machine in cluster, then runs out of resources
Balazs Gibizer
balazs.gibizer at est.tech
Thu Jul 1 07:30:31 UTC 2021
On Wed, Jun 30, 2021 at 21:06, Jeffrey Mazzone <jmazzone at uchicago.edu>
wrote:
> Yes, this is almost exactly what I did. No, I am not running mysql
> in a HA deployment and I have ran nova-manage api_db sync several
> times throughout the process below.
>
> I think I found a work around but im not sure how feasible this is.
>
> I first, changed the reallocation ratio to 1:1. In the nova.conf on
> the controller. Nova would not accept this for some reason and seemed
> like it needed to be changed on the compute node. So I deleted the
> hypervisor, resource provider, and compute service. Changed the
> ratios on the compute node itself, and then re-added it back in. Now
> the capacity changed to 64 which is the number of cores on the
> systems. When starting a vm, it still gets the same number for
> “used” in the placement-api.log: See below:
>
> New ratios
> ~# openstack resource provider inventory list
> 554f2a3b-924e-440c-9847-596064ea0f3f
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | resource_class | allocation_ratio | min_unit | max_unit | reserved
> | step_size | total |
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | VCPU | 1.0 | 1 | 64 | 0
> | 1 | 64 |
> | MEMORY_MB | 1.0 | 1 | 515655 | 512
> | 1 | 515655 |
> | DISK_GB | 1.0 | 1 | 7096 | 0
> | 1 | 7096 |
> +----------------+------------------+----------+----------+----------+-----------+--------+
>
> Error from placement.log
> 2021-06-30 13:49:24.877 4381 WARNING placement.objects.allocation
> [req-7dc8930f-1eac-401a-ade7-af36e64c2ba8
> a770bde56c9d49e68facb792cf69088c 6da06417e0004cbb87c1e64fe1978de5 -
> default default] Over capacity for VCPU on resource provider
> c4199e84-8259-4d0e-9361-9b0d9e6e66b7. Needed: 4, Used: 8206,
> Capacity: 64.0
>
> With that in mind, I did the same procedure again but set the ratio
> to 1024
>
> New ratios
> ~# openstack resource provider inventory list
> 519c1e10-3546-4e3b-a017-3e831376cde8
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | resource_class | allocation_ratio | min_unit | max_unit | reserved
> | step_size | total |
> +----------------+------------------+----------+----------+----------+-----------+--------+
> | VCPU | 1024.0 | 1 | 64 | 0
> | 1 | 64 |
> | MEMORY_MB | 1.0 | 1 | 515655 | 512
> | 1 | 515655 |
> | DISK_GB | 1.0 | 1 | 7096 | 0
> | 1 | 7096 |
> +----------------+------------------+----------+----------+----------+-----------+--------+
>
Your are collecting data from the compute RP
519c1e10-3546-4e3b-a017-3e831376cde8 but placement warns about another
compute RP c4199e84-8259-4d0e-9361-9b0d9e6e66b7.
>
> Now I can spin up vms without issues.
>
> I have 1 test AZ with 2 hosts inside. I have set these hosts to the
> ratio above. I was able to spin up approx 45 4x core VMs without
> issues and no signs of it hitting an upper limit on the host.
>
> 120 | VCPU=64 | 519c1e10-3546-4e3b-a017-3e831376cde8 |
> VCPU=88/65536
> 23 | VCPU=64 | 8f97a3ba-98a0-475e-a3cf-41425569b2cb | VCPU=96/65536
>
>
> I have 2 problems with this fix.
>
> 1) the overcommit is now super high and I have no way, besides
> quotas, to guarantee the system won’t be over provisioned.
> 2) I still don’t know how that “used” resources value is being
> calculated. When this issue first started, the “used” resources
> were a different number. Over the past two days, the used resources
> for a 4 core virtual machine have remained at 8206 but I have no way
> to guarantee this.
>
> My initial tests when this started was to compare the resource values
> when building different size vms. Here is that list:
>
> 1 core - 4107
> 2 core - 4108
> 4 core- 4110
> 8 core - 4114
> 16 core - 4122
> 32 core - 8234
>
> The number on the right is the number the “used” value used to
> be. Yesterday and today, it has changed to 8206 for a 4 core vm, I
> have not tested the rest.
>
> Before I commit to combing through the placement api source code to
> figure out how the “used” value in the placement log is being
> calculated, im hoping someone knows where and how that value is being
> calculated. It does not seem to be a fixed value in the database and
> it doesn’t seem to be effected by the allocation ratios.
>
>
> Thank you in advance!!
> -Jeff Mazzone
> Senior Linux Systems Administrator
> Center for Translational Data Science
> University of Chicago.
>
>
>
>> On Jun 30, 2021, at 2:40 PM, Laurent Dumont
>> <laurentfdumont at gmail.com> wrote:
>>
>> In some cases, the DEBUG messages are a bit verbose but can really
>> walk you through the allocation/scheduling process. You could
>> increase it for nova and restart the api + scheduler on the
>> controllers. I wonder if a desync of the DB could be in cause? Are
>> you running an HA deployment for the mysql backend?
>
More information about the openstack-discuss
mailing list