[Openstack] Lack of Balance solution such as Watcher.
I am building a private cloud.Everything is ok. But I cannot find a way to balance vm when It have heavy load which cause impact other vm on the compute node. I found solutions such as Watcher and Leveller. But they need to be done manually. Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it. Leveller is good, but It is obsolete. Nguyen Huu Khoi
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:
Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it.
Hi! What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi. Cheers, Thomas Goirand (zigo)
Hello. I cannot use because missing cpu_util metric. I try to match it work but not yet. It need some code to make it work. It seem none care about balance reources on cloud. On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo@debian.org> wrote:
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:
Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it.
Hi!
What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi.
Cheers,
Thomas Goirand (zigo)
Eventually I don't fully understand reasons behind need of such service. As fighting with high load by migrating instances between computes is fighting with consequences rather then with root cause, not saying that it brings more negative effects then positive for experience of the end-users, as you're just moving problem to another place affecting more workloads with degraded performance. If you struggling from high load on a daily basis - then you have too high cpu_allocation_ratio set for computes. As high load issues always come from attempts to oversell too agressively. If you have workloads in the cloud that always utilize all CPUs available - then you should consider having flavors and aggregates with cpu-pinning, meaning providing physical CPUs for such workloads. Also don't forget, that it's worth setting more realistic numbers for reserved resources on computes, because default 2gb of RAM is usually too small. ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>:
Hello. I cannot use because missing cpu_util metric. I try to match it work but not yet. It need some code to make it work. It seem none care about balance reources on cloud.
On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo@debian.org> wrote:
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:
Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it.
Hi!
What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi.
Cheers,
Thomas Goirand (zigo)
On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
Eventually I don't fully understand reasons behind need of such service.
As fighting with high load by migrating instances between computes is fighting with consequences rather then with root cause, not saying that it brings more negative effects then positive for experience of the end-users, as you're just moving problem to another place affecting more workloads with degraded performance.
If you struggling from high load on a daily basis - then you have too high cpu_allocation_ratio set for computes. As high load issues always come from attempts to oversell too agressively.
If you have workloads in the cloud that always utilize all CPUs available - then you should consider having flavors and aggregates with cpu-pinning, meaning providing physical CPUs for such workloads.
Also don't forget, that it's worth setting more realistic numbers for reserved resources on computes, because default 2gb of RAM is usually too small. i tend to agree although there are some thing you can do in the nova schduler ot help e.g. prefering spreading over packing.
for cpu load in particalar you can also enable the metric weigher i have not read this thread in detail altough skiming i see refrences to ceilometer. nova's metrics weigher has no depency on it. the metrics weigher https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics... is configured by adding weight_setting in the schduler config https://docs.openstack.org/nova/latest/configuration/config.html#metrics.wei... [metrics] weight_setting = name1=1.0, name2=-1.0 and enabeling the monitors in the nova-comptue config https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.com... [DEFAULT] compute_monitors = cpu.virt_driver ^ that is the only one we support the datafiles we report are set here https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt... the more intersting values are "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent" we have a fairly large internal cloud that is used for dev and ci and as of about 12 to 18 months ago they have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus and this help blance systme load. [metrics] weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0 you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation. and you woudl want to prefer idle host if your intent is to blance load. iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so [metrics] weight_setting = cpu.percent=-1.0 would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the metrics weigher and see it that helps. just be awere that collecting the cpu metrics and providing them to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for each compute. if you have a lot of compute that might be problematic. its one of the reasons we decided not to add more metrics like this.
ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>:
Hello. I cannot use because missing cpu_util metric. I try to match it work but not yet. It need some code to make it work. It seem none care about balance reources on cloud.
On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo@debian.org> wrote:
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:
Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it.
Hi!
What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi.
Cheers,
Thomas Goirand (zigo)
Oh, thanks for that detailed explanation! I was looking at metrics weighter for years and looked through code couple of times but never got it properly configured. That is very helpful, thanks a lot! чт, 16 мар. 2023 г., 09:46 Sean Mooney <smooney@redhat.com>:
On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
Eventually I don't fully understand reasons behind need of such service.
As fighting with high load by migrating instances between computes is fighting with consequences rather then with root cause, not saying that it brings more negative effects then positive for experience of the end-users, as you're just moving problem to another place affecting more workloads with degraded performance.
If you struggling from high load on a daily basis - then you have too high cpu_allocation_ratio set for computes. As high load issues always come from attempts to oversell too agressively.
If you have workloads in the cloud that always utilize all CPUs available - then you should consider having flavors and aggregates with cpu-pinning, meaning providing physical CPUs for such workloads.
Also don't forget, that it's worth setting more realistic numbers for reserved resources on computes, because default 2gb of RAM is usually too small. i tend to agree although there are some thing you can do in the nova schduler ot help e.g. prefering spreading over packing.
for cpu load in particalar you can also enable the metric weigher
i have not read this thread in detail altough skiming i see refrences to ceilometer. nova's metrics weigher has no depency on it. the metrics weigher
https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics... is configured by adding weight_setting in the schduler config
https://docs.openstack.org/nova/latest/configuration/config.html#metrics.wei...
[metrics] weight_setting = name1=1.0, name2=-1.0 and enabeling the monitors in the nova-comptue config
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.com... [DEFAULT] compute_monitors = cpu.virt_driver
^ that is the only one we support
the datafiles we report are set here
https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt...
the more intersting values are "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"
we have a fairly large internal cloud that is used for dev and ci and as of about 12 to 18 months ago they have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus and this help blance systme load.
[metrics] weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0
you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation. and you woudl want to prefer idle host if your intent is to blance load.
iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so [metrics] weight_setting = cpu.percent=-1.0 would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly
so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the metrics weigher and see it that helps. just be awere that collecting the cpu metrics and providing them to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for each compute. if you have a lot of compute that might be problematic. its one of the reasons we decided not to add more metrics like this.
ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>:
Hello. I cannot use because missing cpu_util metric. I try to match it work
but
not yet. It need some code to make it work. It seem none care about balance reources on cloud.
On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo@debian.org> wrote:
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:
Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it.
Hi!
What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi.
Cheers,
Thomas Goirand (zigo)
On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote:
Oh, thanks for that detailed explanation! I was looking at metrics weighter for years and looked through code couple of times but never got it properly configured. That is very helpful, thanks a lot!
that tells me i sure porbaly update the docs...
чт, 16 мар. 2023 г., 09:46 Sean Mooney <smooney@redhat.com>:
On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
Eventually I don't fully understand reasons behind need of such service.
As fighting with high load by migrating instances between computes is fighting with consequences rather then with root cause, not saying that it brings more negative effects then positive for experience of the end-users, as you're just moving problem to another place affecting more workloads with degraded performance.
If you struggling from high load on a daily basis - then you have too high cpu_allocation_ratio set for computes. As high load issues always come from attempts to oversell too agressively.
If you have workloads in the cloud that always utilize all CPUs available - then you should consider having flavors and aggregates with cpu-pinning, meaning providing physical CPUs for such workloads.
Also don't forget, that it's worth setting more realistic numbers for reserved resources on computes, because default 2gb of RAM is usually too small. i tend to agree although there are some thing you can do in the nova schduler ot help e.g. prefering spreading over packing.
for cpu load in particalar you can also enable the metric weigher
i have not read this thread in detail altough skiming i see refrences to ceilometer. nova's metrics weigher has no depency on it. the metrics weigher
https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics... is configured by adding weight_setting in the schduler config
https://docs.openstack.org/nova/latest/configuration/config.html#metrics.wei...
[metrics] weight_setting = name1=1.0, name2=-1.0 and enabeling the monitors in the nova-comptue config
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.com... [DEFAULT] compute_monitors = cpu.virt_driver
^ that is the only one we support
the datafiles we report are set here
https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt...
the more intersting values are "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"
we have a fairly large internal cloud that is used for dev and ci and as of about 12 to 18 months ago they have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus and this help blance systme load.
[metrics] weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0
you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation. and you woudl want to prefer idle host if your intent is to blance load.
iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so [metrics] weight_setting = cpu.percent=-1.0 would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly
so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the metrics weigher and see it that helps. just be awere that collecting the cpu metrics and providing them to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for each compute. if you have a lot of compute that might be problematic. its one of the reasons we decided not to add more metrics like this.
ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <nguyenhuukhoinw@gmail.com>:
Hello. I cannot use because missing cpu_util metric. I try to match it work
but
not yet. It need some code to make it work. It seem none care about balance reources on cloud.
On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo@debian.org> wrote:
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:
Watcher is not good because It need cpu metric such as cpu load in Ceilometer which is removed so we cannot use it.
Hi!
What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi.
Cheers,
Thomas Goirand (zigo)
Thank you very much for sharing! I will dig dive with it. Nguyen Huu Khoi On Thu, Mar 16, 2023 at 4:54 PM Sean Mooney <smooney@redhat.com> wrote:
Oh, thanks for that detailed explanation! I was looking at metrics weighter for years and looked through code couple of times but never got it properly configured. That is very helpful,
On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote: thanks
a lot!
that tells me i sure porbaly update the docs...
чт, 16 мар. 2023 г., 09:46 Sean Mooney <smooney@redhat.com>:
On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:
Eventually I don't fully understand reasons behind need of such
As fighting with high load by migrating instances between computes is fighting with consequences rather then with root cause, not saying
it
brings more negative effects then positive for experience of the end-users, as you're just moving problem to another place affecting more workloads with degraded performance.
If you struggling from high load on a daily basis - then you have too high cpu_allocation_ratio set for computes. As high load issues always come from attempts to oversell too agressively.
If you have workloads in the cloud that always utilize all CPUs available - then you should consider having flavors and aggregates with cpu-pinning, meaning providing physical CPUs for such workloads.
Also don't forget, that it's worth setting more realistic numbers for reserved resources on computes, because default 2gb of RAM is usually too small. i tend to agree although there are some thing you can do in the nova schduler ot help e.g. prefering spreading over packing.
for cpu load in particalar you can also enable the metric weigher
i have not read this thread in detail altough skiming i see refrences to ceilometer. nova's metrics weigher has no depency on it. the metrics weigher
https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics...
is configured by adding weight_setting in the schduler config
https://docs.openstack.org/nova/latest/configuration/config.html#metrics.wei...
[metrics] weight_setting = name1=1.0, name2=-1.0 and enabeling the monitors in the nova-comptue config
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.com...
[DEFAULT] compute_monitors = cpu.virt_driver
^ that is the only one we support
the datafiles we report are set here
https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt...
the more intersting values are "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"
we have a fairly large internal cloud that is used for dev and ci and
as
of about 12 to 18 months ago they have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus and this help blance systme load.
[metrics] weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0
you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation. and you woudl want to prefer idle host if your intent is to blance load.
iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so [metrics] weight_setting = cpu.percent=-1.0 would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly
so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the metrics weigher and see it that helps. just be awere that collecting
service. that the
cpu metrics and providing them to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for each compute. if you have a lot of compute that might be problematic. its one of the reasons we decided not to add more metrics like this.
ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <
nguyenhuukhoinw@gmail.com>:
Hello. I cannot use because missing cpu_util metric. I try to match it
work but
not yet. It need some code to make it work. It seem none care about balance reources on cloud.
On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <zigo@debian.org> wrote:
On 12/11/22 01:59, Nguyễn Hữu Khôi wrote: > Watcher is not good because It need cpu metric > such as cpu load in Ceilometer which is removed so we cannot use it.
Hi!
What do you mean by "Ceilometer [is] removed"? It certainly isn't dead, and it works well... If by that, you mean "ceilometer-api" is removed, then yes, but then you can use gnocchi.
Cheers,
Thomas Goirand (zigo)
participants (4)
-
Dmitriy Rabotyagov
-
Nguyễn Hữu Khôi
-
Sean Mooney
-
Thomas Goirand