[openstack-dev] [nova][placement] Scheduler VM distribution

Jay Pipes jaypipes at gmail.com
Thu Apr 19 15:48:31 UTC 2018


Привет, Андрей! Comments inline...

On 04/19/2018 10:27 AM, Andrey Volkov wrote:
> Hello,
> 
>  From my understanding, we have a race between the scheduling
> process and host weight update.
> 
> I made a simple experiment. On the 50 fake host environment
> it was asked to boot 40 VMs those should be placed 1 on each host.
> The hosts are equal to each other in terms of inventory.
> 
> img=6fedf6a1-5a55-4149-b774-b0b4dccd2ed1
> flavor=1
> for i in {1..40}; do
> nova boot --flavor $flavor --image $img --nic none vm-$i;
> sleep 1;
> done
> 
> The following distribution was gotten:
> 
> mysql> select resource_provider_id, count(*) from allocations where 
> resource_class_id = 0 group by 1;
> 
> +----------------------+----------+
> | resource_provider_id | count(*) |
> +----------------------+----------+
> |                    1 |        2 |
> |                   18 |        2 |
> |                   19 |        3 |
> |                   20 |        3 |
> |                   26 |        2 |
> |                   29 |        2 |
> |                   33 |        3 |
> |                   36 |        2 |
> |                   41 |        1 |
> |                   49 |        3 |
> |                   51 |        2 |
> |                   52 |        3 |
> |                   55 |        2 |
> |                   60 |        3 |
> |                   61 |        2 |
> |                   63 |        2 |
> |                   67 |        3 |
> +----------------------+----------+
> 17 rows in set (0.00 sec)
> 
> And the question is:
> If we have an atomic resource allocation what is the reason
> to use compute_nodes.* for weight calculation?

The resource allocation is only atomic in the placement service, since 
the placement service prevents clients from modifying records that have 
changed since the client read information about the record (it uses a 
"generation" field in the resource_providers table records to provide 
this protection).

What seems to be happening is that a scheduler thread's view of the set 
of HostState objects used in weighing is stale at some point in the 
weighing process. I'm going to guess and say you have 3 scheduler 
processes, right?

In other words, what is happening is something like this:

(Tx indicates a period in sequential time)

T0: thread A gets a list of filtered hosts and weighs them.
T1: thread B gets a list of filtered hosts and weighs them.
T2: thread A picks the first host in its weighed list
T3: thread B picks the first host in its weighed list (this is the same 
host as thread A picked)
T4: thread B increments the num_instances attribute of its HostState 
object for the chosen host (done in the 
HostState._consume_from_request() method)
T5: thread A increments the num_instances attribute of its HostState 
object for the same chosen host.

So, both thread A and B choose the same host because at the time they 
read the HostState objects, the num_instances attribute was 0 and the 
weight for that host was the same (2.0 in the logs).

I'm not aware of any effort to fix this behaviour in the scheduler.

Best,
-jay

> There is a custom log of behavior I described: http://ix.io/18cw
> 
> -- 
> Thanks,
> 
> Andrey Volkov,
> Software Engineer, Mirantis, Inc.
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list