<div dir="ltr">Thank you very much for sharing!<div>I will dig dive with it.<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Nguyen Huu Khoi<br></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 16, 2023 at 4:54 PM Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 2023-03-16 at 10:35 +0100, Dmitriy Rabotyagov wrote:<br>

> Oh, thanks for that detailed explanation!<br>

> I was looking at metrics weighter for years and looked through code couple<br>

> of times but never got it properly configured. That is very helpful, thanks<br>

> a lot!<br>

<br>

that tells me i sure porbaly update the docs...<br>

> <br>

> чт, 16 мар. 2023 г., 09:46 Sean Mooney <<a href="mailto:smooney@redhat.com" target="_blank">smooney@redhat.com</a>>:<br>

> <br>

> > On Thu, 2023-03-16 at 02:03 +0100, Dmitriy Rabotyagov wrote:<br>

> > > Eventually I don't fully understand reasons behind need of such service.<br>

> > > <br>

> > > As fighting with high load by migrating instances between computes is<br>

> > > fighting with consequences rather then with root cause, not saying that<br>

> > it<br>

> > > brings more negative effects then positive for experience of the<br>

> > end-users,<br>

> > > as you're just moving problem to another place affecting more workloads<br>

> > > with degraded performance.<br>

> > > <br>

> > > If you struggling from high load on a daily basis - then you have too<br>

> > high<br>

> > > cpu_allocation_ratio set for computes. As high load issues always come<br>

> > from<br>

> > > attempts to oversell too agressively.<br>

> > > <br>

> > > If you have workloads in the cloud that always utilize all CPUs<br>

> > available -<br>

> > > then you should consider having flavors and aggregates with cpu-pinning,<br>

> > > meaning providing physical CPUs for such workloads.<br>

> > > <br>

> > > Also don't forget, that it's worth setting more realistic numbers for<br>

> > > reserved resources on computes, because default 2gb of RAM is usually too<br>

> > > small.<br>

> > i tend to agree although there are some thing you can do in the nova<br>

> > schduler ot help<br>

> > e.g. prefering spreading over packing.<br>

> > <br>

> > for cpu load in particalar you can also enable the metric weigher<br>

> > <br>

> > i have not read this thread in detail altough skiming i see refrences to<br>

> > ceilometer.<br>

> > nova's metrics weigher has no depency on it.<br>

> > the metrics weigher<br>

> > <br>

> > <a href="https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/scheduler/weights/metrics.py</a><br>

> > is configured by adding weight_setting in the schduler config<br>

> > <br>

> > <a href="https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting" rel="noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#metrics.weight_setting</a><br>

> > <br>

> >     [metrics]<br>

> >     weight_setting = name1=1.0, name2=-1.0<br>

> > and enabeling the monitors in the nova-comptue config<br>

> > <br>

> > <a href="https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors" rel="noreferrer" target="_blank">https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.compute_monitors</a><br>

> > [DEFAULT]<br>

> > compute_monitors = cpu.virt_driver<br>

> > <br>

> > ^ that is the only one we support<br>

> > <br>

> > the datafiles we report are set here<br>

> > <br>

> > <a href="https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101" rel="noreferrer" target="_blank">https://github.com/openstack/nova/blob/master/nova/compute/monitors/cpu/virt_driver.py#L52-L101</a><br>

> > <br>

> > the more intersting values are<br>

> > "cpu.iowait.percent", "cpu.idle.percent" and "cpu.percent"<br>

> > <br>

> > we have a fairly large internal cloud that is used for dev and ci and as<br>

> > of about 12 to 18 months ago they<br>

> > have been using this to help balance the schduling fo instance as we have<br>

> > a mix of hyperviros skus<br>

> > and this help blance systme load.<br>

> > <br>

> >   [metrics]<br>

> >     weight_setting = cpu.iowait.percent=-1.0, cpu.percent=-1.0,<br>

> > cpu.idle.percent=1.0<br>

> > <br>

> > you want iowait and cpu.percent to be negitive since you want to avoid<br>

> > host with high iowait or high cpu utilsation.<br>

> > and you woudl want to prefer idle host if your intent is to blance load.<br>

> > <br>

> > iowait is actully included in cpu.percent and infact cpu.percent is<br>

> > basicaly cpu load - idel so<br>

> > [metrics]<br>

> >     weight_setting = cpu.percent=-1.0<br>

> > would have a simialreffect but you might want the extra granularity to<br>

> > weight iowait vs idle differntly<br>

> > <br>

> > so if you find the normal cpu/ram/disk weigher are not sufficent to blance<br>

> > based onload check out the<br>

> > metrics weigher and see it that helps. just be awere that collecting the<br>

> > cpu metrics and providing them<br>

> > to the schduelr will increase rabbitmq load a little since we perodicly<br>

> > have ot update those values for<br>

> > each compute. if you have a lot of compute that might be problematic. its<br>

> > one of the reasons we<br>

> > decided not to add more metrics like this.<br>

> > <br>

> > <br>

> > <br>

> > > <br>

> > > <br>

> > > <br>

> > > ср, 15 мар. 2023 г., 13:11 Nguyễn Hữu Khôi <<a href="mailto:nguyenhuukhoinw@gmail.com" target="_blank">nguyenhuukhoinw@gmail.com</a>>:<br>

> > > <br>

> > > > Hello.<br>

> > > > I cannot use because missing cpu_util metric. I try to match it work<br>

> > but<br>

> > > > not yet. It need some code to make it work. It seem none care about<br>

> > balance<br>

> > > > reources on cloud.<br>

> > > > <br>

> > > > On Wed, Mar 15, 2023, 6:26 PM Thomas Goirand <<a href="mailto:zigo@debian.org" target="_blank">zigo@debian.org</a>> wrote:<br>

> > > > <br>

> > > > > On 12/11/22 01:59, Nguyễn Hữu Khôi wrote:<br>

> > > > > > Watcher is not good because It need cpu metric<br>

> > > > > > such as cpu load in Ceilometer which is removed so we cannot use<br>

> > it.<br>

> > > > > <br>

> > > > > Hi!<br>

> > > > > <br>

> > > > > What do you mean by "Ceilometer [is] removed"? It certainly isn't<br>

> > dead,<br>

> > > > > and it works well... If by that, you mean "ceilometer-api" is<br>

> > removed,<br>

> > > > > then yes, but then you can use gnocchi.<br>

> > > > > <br>

> > > > > Cheers,<br>

> > > > > <br>

> > > > > Thomas Goirand (zigo)<br>

> > > > > <br>

> > > > > <br>

> > <br>

> > <br>

<br>

<br>

</blockquote></div>