On 4/17/2019 2:19 PM, melanie witt wrote:
The relevant scheduler log is this one:
2019-04-17 19:53:07.303 98874 DEBUG nova.scheduler.filter_scheduler [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e 6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 - default default] Weighed [(cpu1, cpu1) ram: 32153MB disk: 1906688MB io_ops: 0 instances: 0, (cpu2, cpu2) ram: 30105MB disk: 1886208MB io_ops: 0 instances: 1] _get_sorted_hosts /usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:455
and here we see that host 'cpu1' is being weighed ahead of host 'cpu2', which is the problem. I don't understand this considering the docs say that setting the ram_weight_multiplier to a negative value should result in the host with the lesser RAM being weighed higher/first. According to your log, the opposite is happening -- 'cpu1' with 32153MB RAM is being weighed higher than 'cpu2' with 30105MB RAM.
Either your ram_weight_multiplier setting is not being picked up or there's a bug causing weight to be applied with reverse logic?
Can you look at the scheduler debug log when the service first started up and verify what value of ram_weight_multiplier the service is using?
I agree with Melanie's assessment. Looking at the RAMWeigher code in Rocky we see it's weighing based on the free_ram_mb value in the HostState object: https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/weights/r... Looking at the filtered hosts that were logged: Filtered [(cpu2, cpu2) ram: 30105MB disk: 1886208MB io_ops: 0 instances: 1, (cpu1, cpu1) ram: 32153MB disk: 1906688MB io_ops: 0 instances: 0] The ram value that is logged is free_ram_mb: https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/host_mana... It looks like you also don't have this logging regression fix in your rocky scheduler code, so you might want to patch this in when getting new debug logs: https://review.openstack.org/#/c/641355/ That could tell us what the resulting weight is. -- Thanks, Matt