[nova] Instance Even Scheduling

Sean Mooney smooney at redhat.com
Tue Jan 18 19:15:16 UTC 2022


On Tue, 2022-01-18 at 18:22 +0000, Tony Liu wrote:
> While looking into it, found  a bug on this line. It's in all branches.
> 
> https://github.com/openstack/nova/blob/master/nova/weights.py#L51
> 
> It's supposed to return a list of normalized values. Here is the fix.
> 
> -    return ((i - minval) / range_ for i in weight_list)
> +    return ([(i - minval) / range_ for i in weight_list])
its currently returning generator of vaules your change just converts it to a list.

i dont think this is a bug nessisarly the orginal code is more effince.
it would only be an issue if the calling code tried to loop over it multiple times.

looking at how its used
https://github.com/openstack/nova/blob/0e0196d979cf1b8e63b9656358116a36f1f09ede/nova/weights.py#L132-L140
a generator should work fine i belive.
without your change.

you shoudl not read too much into the fact the docstring says
"Normalize the values in a list between 0 and 1.0."

we are not using mypy and type hints in this par tof the code and its asl not using the
doc strig syntax to refer to peratmer tyeps and return values.


> 
> The negative weight is from build-failure weigher.
> Looking into nova-compute...
> 
> Again, is nova-compute supposed to provide available resource, instead of
> total resource, for weigher to evaluate?

in genergal yes we use the curreently avaiable resouce when weigheing but it depends
on the weigher.

quantitive weighers like ram disk and cpu will use the aviable capastiy rather then total.

> 
> Thanks!
> Tony
> ________________________________________
> From: Tony Liu <tonyliu0592 at hotmail.com>
> Sent: January 17, 2022 06:27 PM
> To: Sean Mooney; openstack-discuss at lists.openstack.org
> Subject: Re: [nova] Instance Even Scheduling
> 
> Hi,
> 
> I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned
> as valid hosts from filter. Here is the weight log. This is from Train release.
> 
> This is for the first VM.
> "ram" is the total memory. Is it supposed to be the available or consumed memory?
> It's the same for all nodes because they all have the same spec.
> "disk" is also the total. Because all compute nodes are using the same shared
> Ceph storage, disk is the same for all nodes.
> "instances" is the current number of instances on that node.
> I don't see cpu. Is cpu weigher not there yet in Train?
> Only compute-11 has positive weight,  all others have negative weight.
> How comes the weight is negative for other nodes? Given the logging,
> they are all the same except for instances.
> ================
> Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462
> ================
> 
> For the second VM.
> ================
> Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462
> ================
> 
> Given above logging, compute-11 is always the winner of weight. It's just that
> when weighing for the next VM, the "instances" of compute-11 bump up, all others
> are the same. At the end, all 5 VMs are created on that same node.
> 
> Is this all expected?
> 
> Thanks!
> Tony
> ________________________________________
> From: Tony Liu <tonyliu0592 at hotmail.com>
> Sent: January 17, 2022 10:11 AM
> To: Sean Mooney; openstack-discuss at lists.openstack.org
> Subject: Re: [nova] Instance Even Scheduling
> 
> That disk weigher is a good point. I am using Ceph as the storage backend
> for all compute nodes. Disk weigher may not handle that properly and cause
> some failure. Anyways, I will enable debug and look into more details.
> 
> Thanks!
> Tony
> ________________________________________
> From: Sean Mooney <smooney at redhat.com>
> Sent: January 17, 2022 09:57 AM
> To: Tony Liu; openstack-discuss at lists.openstack.org
> Subject: Re: [nova] Instance Even Scheduling
> 
> On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote:
> > I recall weight didn't work as what I expected, that's why I used
> > shuffle_best_same_weighed_hosts.
> > 
> > Here is what I experienced.
> > With Ussuri and default Nova scheduling settings. All weighers are supposed
> > to be enabled and all multipliers are positive.
> > 
> yes by default all weighers are enabled and the shcduler spreads by default.
> >  On 10x empty compute nodes
> > with the same spec, say the first vm is created on compute-2. Because some
> > memory and vCPU are consumed, the second vm should be created on some
> > node other than compute-2, if weighers are working fine. But it's still created
> > on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts.
> > 
> i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different
> the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler
> in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior.
> 
> 
> shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie.
> that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates.
> 
> host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts.
> 
> host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers
> intentionlaly adding randomness to the selection.
> this should not be needed in general.
> 
> 
> >  It seems that all compute nodes are equally
> > weighted, although they don't have the same amount of resource.
> > Am I missing anything there?
> > 
> > Thanks!
> > Tony
> > ________________________________________
> > From: Sean Mooney <smooney at redhat.com>
> > Sent: January 17, 2022 09:06 AM
> > To: openstack-discuss at lists.openstack.org
> > Subject: Re: [nova] Instance Even Scheduling
> > 
> > On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote:
> > > https://docs.openstack.org/nova/latest/admin/scheduling.html
> > > 
> > > Filter gives you a group of valid hosts, assuming they are equally weighted,
> > > you may try with these two settings to pick up a host in a more even manner.
> > > host_subset_size (increase the size)
> > > shuffle_best_same_weighed_hosts (enable the shuffle)
> > > 
> > > https://docs.openstack.org/nova/latest/configuration/config.html
> > 
> > yes the weighers are what will blance between the hosts and the filters determin which host are valid
> > so if you want to spread based on ram then you need to adject the
> > https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.ram_weight_multiplier
> > 
> > for example set ram_weight_multiplier=10.0 to make it relitivly more important.
> > the way the weigher work is all wheigher calulate the weight for a host,
> > we then add them after multiplying them by the weights and then sort.
> > 
> > 
> > > 
> > > 
> > > Tony
> > > ________________________________________
> > > From: Ammad Syed <syedammad83 at gmail.com>
> > > Sent: January 16, 2022 11:53 PM
> > > To: openstack-discuss
> > > Subject: [nova] Instance Even Scheduling
> > > 
> > > Hi,
> > > 
> > > I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it.
> > > 
> > > enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter
> > > 
> > > I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts.
> > > 
> > > - Ammad
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 
> 
> 
> 




More information about the openstack-discuss mailing list