[kolla-ansible][nova]Problem with distribution of instance on servers
Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1. The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value? Thanks in advance if you can help me. Franck VEDEL
There are two settings we've tweaked in the past in Nova. shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better. On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr> wrote:
Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now)
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases. Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers There are two settings we've tweaked in the past in Nova. shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better. On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1. The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3] I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value? Thanks in advance if you can help me. Franck VEDEL
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com> wrote:
Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr<mailto: franck.vedel@univ-grenoble-alpes.fr>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else. Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases. Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers There are two settings we've tweaked in the past in Nova. shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better. On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1. The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3] I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value? Thanks in advance if you can help me. Franck VEDEL
Thank’s a lot ! I changed the settings and indeed, it seems to work. This distribution of instances is really interesting. I learn a lot. Question: is it possible to view the calculated weight when choosing a server? Otherwise, thanks again, really Franck
Le 16 févr. 2022 à 03:35, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
On Wed, 2022-02-16 at 10:52 +0100, Franck VEDEL wrote:
Thank’s a lot ! I changed the settings and indeed, it seems to work. This distribution of instances is really interesting. I learn a lot. Question: is it possible to view the calculated weight when choosing a server? Otherwise, thanks again, really yes this is logged in the schduler at debug level
Franck
Le 16 févr. 2022 à 03:35, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
yes this is logged in the schduler at debug level
is it this ? 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.build_failure_weight_multiplier = 1000000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.cpu_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.cross_cell_move_weight_multiplier = 1000000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.disk_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.803 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.io_ops_weight_multiplier = -1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.804 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.pci_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.804 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.ram_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.soft_affinity_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.soft_anti_affinity_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.weight_classes = ['nova.scheduler.weights.all_weighers'] log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_of_unavailable = -10000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_setting = [] log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 Franck
Le 16 févr. 2022 à 13:45, Sean Mooney <smooney@redhat.com> a écrit :
On Wed, 2022-02-16 at 10:52 +0100, Franck VEDEL wrote:
Thank’s a lot ! I changed the settings and indeed, it seems to work. This distribution of instances is really interesting. I learn a lot. Question: is it possible to view the calculated weight when choosing a server? Otherwise, thanks again, really yes this is logged in the schduler at debug level
Franck
Le 16 févr. 2022 à 03:35, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
Could be nice to have that metric exposed inside the API for nova-hypervisors. We scrape those with Prometheus and an exporter so we could have a bit more visibility. On Wed, Feb 16, 2022 at 1:11 PM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr> wrote:
yes this is logged in the schduler at debug level
is it this ?
2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.build_failure_weight_multiplier = 1000000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.cpu_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.cross_cell_move_weight_multiplier = 1000000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.802 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.disk_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.803 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.io_ops_weight_multiplier = -1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.804 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.pci_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.804 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.ram_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.soft_affinity_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.soft_anti_affinity_weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.805 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] filter_scheduler.weight_classes = ['nova.scheduler.weights.all_weighers'] log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_multiplier = 1.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_of_unavailable = -10000.0 log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611 2022-02-16 10:18:26.806 8 DEBUG oslo_service.service [req-629e8eaf-9e0e-471a-b99c-957459b6c9af - - - - -] metrics.weight_setting = [] log_opt_values /var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py:2611
Franck
Le 16 févr. 2022 à 13:45, Sean Mooney <smooney@redhat.com> a écrit :
On Wed, 2022-02-16 at 10:52 +0100, Franck VEDEL wrote:
Thank’s a lot ! I changed the settings and indeed, it seems to work. This distribution of instances is really interesting. I learn a lot. Question: is it possible to view the calculated weight when choosing a server? Otherwise, thanks again, really
yes this is logged in the schduler at debug level
Franck
Le 16 févr. 2022 à 03:35, Tony Liu <tonyliu0592@hotmail.com> a écrit :
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com< mailto:tonyliu0592@hotmail.com <tonyliu0592@hotmail.com>>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com< mailto:laurentfdumont@gmail.com <laurentfdumont@gmail.com>>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL < franck.vedel@univ-grenoble-alpes.fr< mailto:franck.vedel@univ-grenoble-alpes.fr <franck.vedel@univ-grenoble-alpes.fr>>< mailto:franck.vedel@univ-grenoble-alpes.fr <franck.vedel@univ-grenoble-alpes.fr>< mailto:franck.vedel@univ-grenoble-alpes.fr <franck.vedel@univ-grenoble-alpes.fr>>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else.
On Wed, 2022-02-16 at 02:35 +0000, Tony Liu wrote: this is enabled by default on all deployments and has been for many years at this point. we stongly recommend that it is used. you can elect to disable it but if you do you can end up with vms constantly being sechdluled to the same set of broken hosts this become more apprent as the deployment get more full. while you coudl reduce the weight of this filter it high multipler was conse so that it coudl override the votes of the other weighers. we likely could imporve the weigher perhaps have it age our the failed builds to account for traisient failures or provide a nova-manage command to allow operators to reset the value for a host or soemthign like that but in a healthy cloud you should not get failed builds that land on a host rater then cell0 you can get failed builds where there is no host avaiable but those will land in cell0 and not affect the host failure count. you can also get failed builds due to quota ectra but that is validated in the api before we try to build the instance so if you are getting failed builds it shoudl be an indication that you have at least a trasient problem with your deployment that shoudl be fixed.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
Build failure could be caused by different things, networking, storage, hypervisor, etc. For example, failure caused by Neutron service, that doesn't mean this hypervisor is not healthy, but because of that weigher, even Neutron service is recovered, this hypervisor is still excluded from holding instance. This doesn't make sense. I wouldn't enable this weigher until it's smart enough to know the failure is caused by hypervisor itself, but not anywhere else.
Totally understand the intention and risk. What I'd expect is some way to 1) expose such failure for monitor system to detect it, eg. by local nova-compute API or global nova-api, (we also collect and analyze logs, it will trigger alarm when failure happens, but it would be easier to get such info from API.), then operator will be able to jump in and fix it, 2) reset the failure flag to recover. Thanks! Tony ________________________________________ From: Sean Mooney <smooney@redhat.com> Sent: February 16, 2022 04:45 AM To: Tony Liu; Laurent Dumont Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers On Wed, 2022-02-16 at 02:35 +0000, Tony Liu wrote: this is enabled by default on all deployments and has been for many years at this point. we stongly recommend that it is used. you can elect to disable it but if you do you can end up with vms constantly being sechdluled to the same set of broken hosts this become more apprent as the deployment get more full. while you coudl reduce the weight of this filter it high multipler was conse so that it coudl override the votes of the other weighers. we likely could imporve the weigher perhaps have it age our the failed builds to account for traisient failures or provide a nova-manage command to allow operators to reset the value for a host or soemthign like that but in a healthy cloud you should not get failed builds that land on a host rater then cell0 you can get failed builds where there is no host avaiable but those will land in cell0 and not affect the host failure count. you can also get failed builds due to quota ectra but that is validated in the api before we try to build the instance so if you are getting failed builds it shoudl be an indication that you have at least a trasient problem with your deployment that shoudl be fixed.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com> Sent: February 15, 2022 05:00 PM To: Tony Liu Cc: Franck VEDEL; openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
In a healthy setup, should build_failure_weight_multiplier be triggered?
From the doc, tweaking this might mean you try to schedule and built instances on computes that are not healthy.
On Tue, Feb 15, 2022 at 6:38 PM Tony Liu <tonyliu0592@hotmail.com<mailto:tonyliu0592@hotmail.com>> wrote: Enable debug logging on nova-scheduler, you will see how the winner is picked. I had the same issue before, caused by the build-failure weigher enabled by default. setting build_failure_weight_multiplier to 0 resolved issue for me. Instances are balanced by weighers (compute and memory) as expected. shuffle_best_same_weighed_hosts and host_subset_size are not necessary, unless it's required by certain cases.
Tony ________________________________________ From: Laurent Dumont <laurentfdumont@gmail.com<mailto:laurentfdumont@gmail.com>> Sent: February 15, 2022 12:54 PM To: Franck VEDEL Cc: openstack-discuss Subject: Re: [kolla-ansible][nova]Problem with distribution of instance on servers
There are two settings we've tweaked in the past in Nova.
shuffle_best_same_weighed_hosts --> Allow more spreading in the case of computes with the exact same specs/weights. host_subset_size --> Helps with concurrent requests to get different hosts
Before that, we saw the same behavior with Openstack stacking VM on single computes. It still respects anti-affinity, but I don't see a good reason to not spread as a default. Changing these two was enough to allow our spread to get a little better.
On Tue, Feb 15, 2022 at 11:19 AM Franck VEDEL <franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr><mailto:franck.vedel@univ-grenoble-alpes.fr<mailto:franck.vedel@univ-grenoble-alpes.fr>>> wrote: Hello, I seem to have a problem that I hadn't seen. I have 3 servers for my openstack, built with Kolla-ansible, I'm in Victoria version. I had simply put the 3 servers in the [compute] part of the multinode file, at first it worked, but for some time all the VMs are placed on server 1.
The 3 servers are operational, identical. here are 3 screenshots to show it. (on the images, the instances on servers 2 and 3 are present because it worked correctly, but no more instances are created on these servers now) [cid:17eff2778356f37a4481] [cid:17eff277835e47aa83c2] [cid:17eff2778356f53d34a3]
I tried to understand how the instances are distributed on the servers, but in my case, I don't understand why none are assigned to the 2nd and 3rd server. How to find the problem? It should be nova-scheduler . Do you have to do anything special? Go see if a parameter has a bad value?
Thanks in advance if you can help me.
Franck VEDEL
participants (4)
-
Franck VEDEL
-
Laurent Dumont
-
Sean Mooney
-
Tony Liu