Hi, I enabled debug on nova-scheduler and launched 5 VMs. 8 hosts are returned as valid hosts from filter. Here is the weight log. This is from Train release. This is for the first VM. "ram" is the total memory. Is it supposed to be the available or consumed memory? It's the same for all nodes because they all have the same spec. "disk" is also the total. Because all compute nodes are using the same shared Ceph storage, disk is the same for all nodes. "instances" is the current number of instances on that node. I don't see cpu. Is cpu weigher not there yet in Train? Only compute-11 has positive weight, all others have negative weight. How comes the weight is negative for other nodes? Given the logging, they are all the same except for instances. ================ Weighed [WeighedHost [host: (compute-11, compute-11) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: 2.9901550710003333], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 ================ For the second VM. ================ Weighed [WeighedHost [host: (compute-11, compute-11) ram: 757535MB disk: 114565120MB io_ops: 1 instances: 6, weight: 1.9888744586443294], WeighedHost [host: (compute-2, compute-2) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-8, compute-8) ram: 758047MB disk: 114566144MB io_ops: 0 instances: 5, weight: -399997.009844929], WeighedHost [host: (compute-12, compute-12) ram: 751903MB disk: 114566144MB io_ops: 0 instances: 8, weight: -399997.01968985796], WeighedHost [host: (compute-1, compute-1) ram: 739615MB disk: 114566144MB io_ops: 0 instances: 14, weight: -399997.03937971604], WeighedHost [host: (compute-7, compute-7) ram: 764191MB disk: 114566144MB io_ops: 0 instances: 2, weight: -599997.0], WeighedHost [host: (compute-9, compute-9) ram: 749855MB disk: 114566144MB io_ops: 0 instances: 9, weight: -999997.0229715011], WeighedHost [host: (compute-10, compute-10) ram: 743711MB disk: 114566144MB io_ops: 0 instances: 6, weight: -999997.0328164301]] _get_sorted_hosts /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:462 ================ Given above logging, compute-11 is always the winner of weight. It's just that when weighing for the next VM, the "instances" of compute-11 bump up, all others are the same. At the end, all 5 VMs are created on that same node. Is this all expected? Thanks! Tony ________________________________________ From: Tony Liu <tonyliu0592@hotmail.com> Sent: January 17, 2022 10:11 AM To: Sean Mooney; openstack-discuss@lists.openstack.org Subject: Re: [nova] Instance Even Scheduling That disk weigher is a good point. I am using Ceph as the storage backend for all compute nodes. Disk weigher may not handle that properly and cause some failure. Anyways, I will enable debug and look into more details. Thanks! Tony ________________________________________ From: Sean Mooney <smooney@redhat.com> Sent: January 17, 2022 09:57 AM To: Tony Liu; openstack-discuss@lists.openstack.org Subject: Re: [nova] Instance Even Scheduling On Mon, 2022-01-17 at 17:45 +0000, Tony Liu wrote:
I recall weight didn't work as what I expected, that's why I used shuffle_best_same_weighed_hosts.
Here is what I experienced. With Ussuri and default Nova scheduling settings. All weighers are supposed to be enabled and all multipliers are positive.
yes by default all weighers are enabled and the shcduler spreads by default.
On 10x empty compute nodes with the same spec, say the first vm is created on compute-2. Because some memory and vCPU are consumed, the second vm should be created on some node other than compute-2, if weighers are working fine. But it's still created on compute-2, until I increased host_subset_size and enable shuffle_best_same_weighed_hosts.
i would guess that either the disk weigher or failed build wiehter is likely what results in teh behaivor different the default behavior is still to speread. before assuming there is a but you shoudl enable the schduler in debug mode to look at the weighters that are assinged to each host and determin why you are seeing differnt behavior. shuffle_best_same_weighed_hosts does as the name suggest. it shuffles the result if and only if there is a tie. that means it will only have a effect if 2 hosts were judged by thge weigher as beeing equally good candiates. host_subset_size instalead of looking at only the top host in the list enables you to consider the top n hosts. host_subset_size does a random selection from the host_subset_size top element after the hosts are sorted by the weighers intentionlaly adding randomness to the selection. this should not be needed in general.
It seems that all compute nodes are equally weighted, although they don't have the same amount of resource. Am I missing anything there?
Thanks! Tony ________________________________________ From: Sean Mooney <smooney@redhat.com> Sent: January 17, 2022 09:06 AM To: openstack-discuss@lists.openstack.org Subject: Re: [nova] Instance Even Scheduling
On Mon, 2022-01-17 at 16:35 +0000, Tony Liu wrote:
https://docs.openstack.org/nova/latest/admin/scheduling.html
Filter gives you a group of valid hosts, assuming they are equally weighted, you may try with these two settings to pick up a host in a more even manner. host_subset_size (increase the size) shuffle_best_same_weighed_hosts (enable the shuffle)
https://docs.openstack.org/nova/latest/configuration/config.html
yes the weighers are what will blance between the hosts and the filters determin which host are valid so if you want to spread based on ram then you need to adject the https://docs.openstack.org/nova/latest/configuration/config.html#filter_sche...
for example set ram_weight_multiplier=10.0 to make it relitivly more important. the way the weigher work is all wheigher calulate the weight for a host, we then add them after multiplying them by the weights and then sort.
Tony ________________________________________ From: Ammad Syed <syedammad83@gmail.com> Sent: January 16, 2022 11:53 PM To: openstack-discuss Subject: [nova] Instance Even Scheduling
Hi,
I have 5 compute nodes. When I deploy instances, the most of the instances automatically placed in node 1 or node 2. The other compute nodes remain empty or with one or two instances on it.
enabled_filters = ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateInstanceExtraSpecsFilter
I have enabled the above filters. How to ensure that instances should be scheduled on compute nodes evenly on all compute hosts based on RAM only ? Like scheduler should schedule the instance on compute host which has a large amount of RAM available then other hosts.
- Ammad