<div dir="auto">Oh, thanks for that detailed explanation!<div dir="auto">I was looking at metrics weighter for years and looked through code couple of times but never got it properly configured. That is very helpful, thanks a lot!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">чт, 16 мар. 2023 г., 09:46 Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:<br>

> Eventually I don't fully understand reasons behind need of such service.<br>

> <br>

> As fighting with high load by migrating instances between computes is<br>

> fighting with consequences rather then with root cause, not saying that it<br>

> brings more negative effects then positive for experience of the end-users,<br>

> as you're just moving problem to another place affecting more workloads<br>

> with degraded performance.<br>

> <br>

> If you struggling from high load on a daily basis - then you have too high<br>

> cpu_allocation_ratio set for computes. As high load issues always come from<br>

> attempts to oversell too agressively.<br>

> <br>

> If you have workloads in the cloud that always utilize all CPUs available -<br>

> then you should consider having flavors and aggregates with cpu-pinning,<br>

> meaning providing physical CPUs for such workloads.<br>

> <br>

> Also don't forget, that it's worth setting more realistic numbers for<br>

> reserved resources on computes, because default 2gb of RAM is usually too<br>

> small.<br>

i tend to agree although there are some thing you can do in the nova schduler ot help<br>

e.g. prefering spreading over packing.<br>

<br>

for cpu load in particalar you can also enable the metric weigher<br>

<br>

i have not read this thread in detail altough skiming i see refrences to ceilometer.<br>

nova's metrics weigher has no depency on it.<br>

the metrics weigher <br>

<a href="https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py" rel="noreferrer noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py</a><br>

is configured by adding weight_setting in the schduler config<br>

<a href="https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting" rel="noreferrer noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting</a><br>

<br>

    [metrics]<br>

    weight_setting = name1=1.0, name2=-1.0<br>

and enabeling the monitors in the nova-comptue config<br>

<a href="https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors" rel="noreferrer noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors</a><br>

[DEFAULT]<br>

compute_monitors = cpu.virt_driver<br>

<br>

^ that is the only one we support<br>

<br>

the datafiles we report are set here<br>

<a href="https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101" rel="noreferrer noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101</a><br>

<br>

the more intersting values are <br>

"cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"<br>

<br>

we have a fairly large internal cloud that is used for dev and ci and as of about 12 to 18 months ago they<br>

have been using this to help balance the schduling fo instance as we have a mix of hyperviros skus<br>

and this help blance systme load.<br>

<br>

  [metrics]<br>

    weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0, cpu.idle.percent=1.0<br>

<br>

you want iowait and cpu.percent to be negitive since you want to avoid host with high iowait or high cpu utilsation.<br>

and you woudl want to prefer idle host if your intent is to blance load.<br>

<br>

iowait is actully included in cpu.percent and infact cpu.percent is basicaly cpu load - idel so <br>

[metrics]<br>

    weight_setting = cpu.percent=-1.0<br>

would have a simialreffect but you might want the extra granularity to weight iowait vs idle differntly<br>

<br>

so if you find the normal cpu/ram/disk weigher are not sufficent to blance based onload check out the<br>

metrics weigher and see it that helps. just be awere that collecting the cpu metrics and providing them<br>

to the schduelr will increase rabbitmq load a little since we perodicly have ot update those values for<br>

each compute. if you have a lot of compute that might be problematic. its one of the reasons we<br>

decided not to add more metrics like this.<br>

<br>

<br>

<br>

> <br>

> <br>

> <br>

> ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank" rel="noreferrer">nguyenhuukhoinw@gmail.com</a>>:<br>

> <br>

> > Hello.<br>

> > I cannot use because missing cpu_util metric. I try to match it work but<br>

> > not yet. It need some code to make it work. It seem none care about balance<br>

> > reources on cloud.<br>

> > <br>

> > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <<a href="mailto:zigo@debian.org" target="_blank" rel="noreferrer">zigo@debian.org</a>> wrote:<br>

> > <br>

> > > On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:<br>

> > > > Watcher is not good because It need cpu metric<br>

> > > > such as cpu load in Ceilometer which is removed so we cannot use it.<br>

> > > <br>

> > > Hi!<br>

> > > <br>

> > > What do you mean by "Ceilometer [is] removed"? It certainly isn't dead,<br>

> > > and it works well... If by that, you mean "ceilometer-api" is removed,<br>

> > > then yes, but then you can use gnocchi.<br>

> > > <br>

> > > Cheers,<br>

> > > <br>

> > > Thomas Goirand (zigo)<br>

> > > <br>

> > > <br>

<br>

</blockquote></div>