[nova] Instance Even Scheduling

Tony Liu tonyliu0592 at hotmail.com
Tue Jan 18 18:22:27 UTC 2022


While looking into it, found  a bug on this line. It's in all branches.

https://github.com/openstack/nova/blob/master/nova/weights.py#L51

It's supposed to return a list of normalized values. Here is the fix.

-    return ((i - minval) / range_ for i in weight_list)
+    return ([(i - minval) / range_ for i in weight_list])

The negative weight is from build-failure weigher.
Looking into nova-compute...

Again, is nova-compute supposed to provide available resource, instead of
total resource, for weigher to evaluate?

Thanks!
Tony
________________________________________
From: Tony Liu <tonyliu0592 at hotmail.com>
Sent: January 17, 2022 06:27 PM
To: Sean Mooney; openstack-discuss at lists.openstack.org
Subject: Re: [nova] Instance Even Scheduling

Hi,

I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned
as valid hosts from filter. Here is the weight log. This is from Train release.

This is for the first VM.
"ram" is the total memory. Is it supposed to be the available or consumed memory?
It's the same for all nodes because they all have the same spec.
"disk" is also the total. Because all compute nodes are using the same shared
Ceph storage, disk is the same for all nodes.
"instances" is the current number of instances on that node.
I don't see cpu. Is cpu weigher not there yet in Train?
Only compute-11 has positive weight,  all others have negative weight.
How comes the weight is negative for other nodes? Given the logging,
they are all the same except for instances.
================
Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462
================

For the second VM.
================
Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462
================

Given above logging, compute-11 is always the winner of weight. It's just that
when weighing for the next VM, the "instances" of compute-11 bump up, all others
are the same. At the end, all 5 VMs are created on that same node.

Is this all expected?

Thanks!
Tony
________________________________________
From: Tony Liu <tonyliu0592 at hotmail.com>
Sent: January 17, 2022 10:11 AM
To: Sean Mooney; openstack-discuss at lists.openstack.org
Subject: Re: [nova] Instance Even Scheduling

That disk weigher is a good point. I am using Ceph as the storage backend
for all compute nodes. Disk weigher may not handle that properly and cause
some failure. Anyways, I will enable debug and look into more details.

Thanks!
Tony
________________________________________
From: Sean Mooney <smooney at redhat.com>
Sent: January 17, 2022 09:57 AM
To: Tony Liu; openstack-discuss at lists.openstack.org
Subject: Re: [nova] Instance Even Scheduling

On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote:
> I recall weight didn't work as what I expected, that's why I used
> shuffle_best_same_weighed_hosts.
>
> Here is what I experienced.
> With Ussuri and default Nova scheduling settings. All weighers are supposed
> to be enabled and all multipliers are positive.
>
yes by default all weighers are enabled and the shcduler spreads by default.
>  On 10x empty compute nodes
> with the same spec, say the first vm is created on compute-2. Because some
> memory and vCPU are consumed, the second vm should be created on some
> node other than compute-2, if weighers are working fine. But it's still created
> on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts.
>
i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different
the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler
in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior.


shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie.
that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates.

host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts.

host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers
intentionlaly adding randomness to the selection.
this should not be needed in general.


>  It seems that all compute nodes are equally
> weighted, although they don't have the same amount of resource.
> Am I missing anything there?
>
> Thanks!
> Tony
> ________________________________________
> From: Sean Mooney <smooney at redhat.com>
> Sent: January 17, 2022 09:06 AM
> To: openstack-discuss at lists.openstack.org
> Subject: Re: [nova] Instance Even Scheduling
>
> On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote:
> > https://docs.openstack.org/nova/latest/admin/scheduling.html
> >
> > Filter gives you a group of valid hosts, assuming they are equally weighted,
> > you may try with these two settings to pick up a host in a more even manner.
> > host_subset_size (increase the size)
> > shuffle_best_same_weighed_hosts (enable the shuffle)
> >
> > https://docs.openstack.org/nova/latest/configuration/config.html
>
> yes the weighers are what will blance between the hosts and the filters determin which host are valid
> so if you want to spread based on ram then you need to adject the
> https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier
>
> for example set ram_weight_multiplier=10.0 to make it relitivly more important.
> the way the weigher work is all wheigher calulate the weight for a host,
> we then add them after multiplying them by the weights and then sort.
>
>
> >
> >
> > Tony
> > ________________________________________
> > From: Ammad Syed <syedammad83 at gmail.com>
> > Sent: January 16, 2022 11:53 PM
> > To: openstack-discuss
> > Subject: [nova] Instance Even Scheduling
> >
> > Hi,
> >
> > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it.
> >
> > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter
> >
> > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts.
> >
> > - Ammad
> >
> >
>
>
>
>






More information about the openstack-discuss mailing list