Could be nice to have that metric exposed inside the API for nova-hypervisors.

We scrape those with Prometheus and an exporter so we could have a bit more visibility. 

On Wed, Feb 16, 2022 at 1:11 PM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr> wrote:
yes this is logged in the schduler at debug level

is it this ?

2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.build_failure_weight_multiplier = 1000000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.cpu_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.cross_cell_move_weight_multiplier = 1000000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.disk_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.803 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.io_ops_weight_multiplier = -1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.804 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.pci_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.804 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.ram_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.soft_affinity_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.soft_anti_affinity_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.weight_classes = ['nova.scheduler.weights.all_weighers'] log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_multiplier      = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_of_unavailable  = -10000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_setting         = [] log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611


Franck 

Le 16 févr. 2022 à 13:45, Sean Mooney <smooney@redhat.com> a écrit :

On Wed, 2022-02-16 at 10:52 +0100, Franck VEDEL wrote:
Thank’s a lot !
I changed the settings and indeed, it seems to work. This distribution of instances is really interesting. I learn a lot.
Question: is it possible to view the calculated weight when choosing a server?
Otherwise, thanks again, really
yes this is logged in the schduler at debug level

Franck

Le 16 févr. 2022 à 03:35, Tony Liu <tonyliu0592@hotmail.com> a écrit :

Build failure could be caused by different things, networking, storage, hypervisor, etc.
For example, failure caused by Neutron service, that doesn't mean this hypervisor is
not healthy, but because of that weigher, even Neutron service is recovered, this
hypervisor is still excluded from holding instance. This doesn't make sense.
I wouldn't enable this weigher until it's smart enough to know the failure is caused
by hypervisor itself, but not anywhere else.

Tony
________________________________________
From: Laurent Dumont <laurentfdumont@gmail.com>
Sent: February 15, 2022 05:00 PM
To: Tony Liu
Cc: Franck VEDEL; openstack-discuss
Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers

In a healthy setup, should build_failure_weight_multiplier be triggered?

From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.

On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote:
Enable debug logging on nova-scheduler, you will see how the winner is picked.
I had the same issue before, caused by the build-failure weigher enabled by default.
setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are
balanced by weighers (compute and memory) as expected.
shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless
it's required by certain cases.

Tony
________________________________________
From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>>
Sent: February 15, 2022 12:54 PM
To: Franck VEDEL
Cc: openstack-discuss
Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers

There are two settings we've tweaked in the past in Nova.

shuffle_best_same_weighed_hosts  --> Allow more spreading in the case of computes with the exact same specs/weights.
host_subset_size --> Helps with concurrent requests to get different hosts

Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.

On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote:
Hello,
I seem to have a problem that I hadn't seen.
I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version.
I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.

The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now)
[cid:17eff2778356f37a4481]
[cid:17eff277835e47aa83c2]
[cid:17eff2778356f53d34a3]


I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server.
How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?


Thanks in advance if you can help me.

Franck VEDEL