<div dir="ltr"><div dir="ltr">Hi,<div>thank you all for your comments/suggestions.</div><div><br></div><div>Having a "custom" cpu_mode seems the best option for our use case.</div><div>"host-passhtough" is problematic when the hardware is retired and instances need to be moved to newer compute nodes.</div><div><br></div><div>Belmiro</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Aug 19, 2020 at 11:21 AM Arnaud Morin <<a href="mailto:arnaud.morin@gmail.com">arnaud.morin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>

Hello,<br>

<br>

We have the same kind of issue.<br>

To help mitigate it, we do segregation and also use cpu_mode=custom, so we<br>

can use a model which is close to our hardware (cpu_model=Haswell-noTSX)<br>

and add extra_flags when needed.<br>

<br>

This is painful.<br>

<br>

Cheers,<br>

<br>

-- <br>

Arnaud Morin<br>

<br>

On 18.08.20 - 16:16, Sean Mooney wrote:<br>

> On Tue, 2020-08-18 at 17:06 +0200, Fabian Zimmermann wrote:<br>

> > Hi,<br>

> > <br>

> > We are using the "custom"-way. But this does not protect you from all issues.<br>

> > <br>

> > We had problems with a new cpu-generation not (jet) detected correctly<br>

> > in an libvirt-version. So libvirt failed back to the "desktop"-cpu of<br>

> > this newer generation, but didnt support/detect some features =><br>

> > blocked live-migration.<br>

> yes that is common when using really new hardware. having previouly worked<br>

> at intel and hitting this often that one of the reason i tend to default to host-passthouh<br>

> and recommend using AZ or aggreate to segreatate the cloud for live migration.<br>

> <br>

> in the case where your libvirt does not know about the new cpus your best approch is to use the<br>

> newest server cpu model that it know about and then if you really need the new fature you can try<br>

> to add theem using the config options  but that is effectivly the same as using host-passhtough<br>

> which is why i default to that as a workaround instead.<br>

> <br>

> > <br>

> >  Fabian<br>

> > <br>

> > Am Di., 18. Aug. 2020 um 16:54 Uhr schrieb Belmiro Moreira<br>

> > <<a href="mailto:moreira.belmiro.email.lists@gmail.com" target="_blank">moreira.belmiro.email.lists@gmail.com</a>>:<br>

> > > <br>

> > > Hi,<br>

> > > in our infrastructure we have always compute nodes that need a hardware intervention and as a consequence they are<br>

> > > rebooted, bringing a new kernel, kvm, ...<br>

> > > <br>

> > > In order to have a good compromise between performance and flexibility (live migration) we have been using "host-<br>

> > > model" for the "cpu_mode" configuration of our service VMs. We didn't expect to have CPU compatibility issues<br>

> > > because we have the same hardware type per cell.<br>

> > > <br>

> > > The problem is that when a compute node is rebooted the instance domain is recreated with the new cpu features that<br>

> > > were introduced because of the reboot (using centOS).<br>

> > > <br>

> > > If there are new CPU features exposed, this basically blocks live migration to all the non rebooted compute nodes<br>

> > > (those cpu features are not exposed, yet). The nova-scheduler doesn't know about them when scheduling the live<br>

> > > migration destination.<br>

> > > <br>

> > > I wonder how other operators are solving this issue.<br>

> > > I don't like stopping OS upgrades.<br>

> > > What I'm considering is to define a "custom" cpu_mode for each hardware type.<br>

> > > <br>

> > > I would appreciate your comments and learn how you are solving this problem.<br>

> > > <br>

> > > Belmiro<br>

> > > <br>

> > <br>

> > <br>

> <br>

> <br>

</blockquote></div>