OK! I have applied the patch and now weights are shown! Furthermore as per your suggestion I have removed the "RamFilter" which was the only one present And here is the new log where spawning of 2 VMs can be seen with a few seconds of difference: https://pastebin.com/Xy2FL2KL Initially both hosts are of weight 1.0 then the one with one VM already running has negative weight but the new VM is placed on the other host. Really-really strange why this is happening... G.
On Wed, 17 Apr 2019 12:55:45 -0700, Melanie Witt <melwittt@gmail.com> wrote: On Wed, 17 Apr 2019 22:45:45 +0300, Georgios Dimitrakakis <giorgis@acmac.uoc.gr> wrote:
Hello again Menalie!
Exactly this is what I am thinking...something is not working correctly!
To answer your questions there is one node acting as controller where the scheduler is running and I have pasted the nova.conf file from there.
I have also noticed that I have "ram_weight_multiplier" two times (one in [cells] and one in [filter_scheduler]) therefore I have removed the one in [cells] because I though it might give a problem but the results are still the same.
The log for the scheduler has this entry:
2019-04-17 22:04:50.045 131723 DEBUG oslo_service.service [req-7e548ecb-f3ed-4a4d-835f-b3a996e32534 - - - - -] filter_scheduler.ram_weight_multiplier = -1.0 log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:3032
so it seems to be picked up correctly but without any influence. Agreed, that log shows that the -1.0 value is being picked up properly by the scheduler service.
What also worries me from the scheduler log that I have send to you before is that in there I see an entry like this:
2019-04-17 19:53:07.298 98874 DEBUG nova.filters [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e 6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 - default default] Filter RamFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
Shouldn't the RamFilter return 1host and the one with less RAM? Why does it return 2hosts?? No -- the RamFilter will return any hosts that meet the RAM requirement. Filters do not weigh hosts. The RamFilter returns two hosts because both hosts have enough RAM to fulfill the request. FYI though, as of Pike [1], the (Core|Ram|Disk)Filter are redundant, as placement will do the filtering for those resources before the nova scheduler filters run. So you can safely remove (Core|Ram|Disk)Filter from your enabled_filters. [1] https://docs.openstack.org/releasenotes/nova/pike.html#relnotes-16-0-0-stabl...
If you have any other ideas or would like me to do some more checking I am all ears! At this point, you could take Matt's suggestion from his latest reply on this thread and patch in the logging regression fix he linked. That would allow you to see in the debug log what weights nova is giving to the hosts.
OK, so I just searched open nova bugs for "weigh" and found this issue, which isn't necessarily a defect:
https://bugs.launchpad.net/nova/+bug/1818239
but something that could be affecting the host weighing in your environment. There's something called the BuildFailureWeigher which will apply a low weight multiplier to hosts that have had VMs fail to build on them. And that weight resets when a host experiences a successful VM build.
If you apply the patch Matt suggested and take a look at the host weights, we should be able to see whether the BuildFailureWeigher is involved in the behavior you're seeing.
-melanie
Aside from that, it's looking like we/I would need to reproduce this issue locally with a devstack and try to figure out what's causing this behavior. -melanie
Thank you both Melanie and Matt for trying to assist me. I have double checked the nova.conf at the controller and here is what I have (ignored hashed lines and obfuscating sensitive data): https://pastebin.com/hW1PE4U7 As you can see I have everything with default values as discussed before with Melanie except the filters and the weight that I have applied that should lead to VM stacking instead of spreading. My case scenario is with two compute hosts (let's call them "cpu1" and "cpu2") and when an instance is already placed on "cpu2" I expect the next instance to be placed also there. But instead is placed on "cpu1" as you can see from the scheduler log that can find here: https://pastebin.com/sCzB9L2e Do you see something strange that I fail to recognize?
Thanks for providing the helpful data. It appears you have set your nova.conf correctly (this is where your scheduler is running, yes?). I notice you have duplicated the ram_weight_multiplier setting but that shouldn't hurt anything.
The relevant scheduler log is this one:
2019-04-17 19:53:07.303 98874 DEBUG nova.scheduler.filter_scheduler [req-02fb5504-cbdb-4219-9509-d2be9da7bb0e 6a4c2e32919e4a6fa5c5d956beb68eef 9f22e9bfa7974e14871d58bbb62242b2 - default default] Weighed [(cpu1, cpu1) ram: 32153MB disk: 1906688MB io_ops: 0 instances: 0, (cpu2, cpu2) ram: 30105MB disk: 1886208MB io_ops: 0 instances: 1] _get_sorted_hosts
/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:455
and here we see that host 'cpu1' is being weighed ahead of host 'cpu2', which is the problem. I don't understand this considering the docs say that setting the ram_weight_multiplier to a negative value should result in the host with the lesser RAM being weighed higher/first. According to your log, the opposite is happening -- 'cpu1' with 32153MB RAM is being weighed higher than 'cpu2' with 30105MB RAM.
Either your ram_weight_multiplier setting is not being picked up or there's a bug causing weight to be applied with reverse logic?
Can you look at the scheduler debug log when the service first started up and verify what value of ram_weight_multiplier the service is using?
-melanie
>> On 4/16/2019 7:03 PM, melanie witt wrote: >> To debug further, you should set debug to True in the nova.conf on >> your scheduler host and look for which filter is removing the >> desired >> host for the second VM. You can find where to start by looking for >> a >> message like, "Starting with N host(s)". If you have two hosts >> with >> enough RAM, you should see "Starting with 2 host(s)" and then look >> for >> the log message where it says "Filter returned 1 host(s)" and that >> will be the filter that is removing the desired host. Once you >> know >> which filter is removing it, you can debug further. > > If the other host isn't getting filtered out, it could be the > weighers that aren't prioritizing the host you expect, but debug > logs > should dump the weighed hosts as well which might give a clue.