[nova][ops] Live migration and CPU features

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Fri Aug 21 09:26:35 UTC 2020


Hi,
thank you all for your comments/suggestions.

Having a "custom" cpu_mode seems the best option for our use case.
"host-passhtough" is problematic when the hardware is retired and instances
need to be moved to newer compute nodes.

Belmiro

On Wed, Aug 19, 2020 at 11:21 AM Arnaud Morin <arnaud.morin at gmail.com>
wrote:

>
> Hello,
>
> We have the same kind of issue.
> To help mitigate it, we do segregation and also use cpu_mode=custom, so we
> can use a model which is close to our hardware (cpu_model=Haswell-noTSX)
> and add extra_flags when needed.
>
> This is painful.
>
> Cheers,
>
> --
> Arnaud Morin
>
> On 18.08.20 - 16:16, Sean Mooney wrote:
> > On Tue, 2020-08-18 at 17:06 +0200, Fabian Zimmermann wrote:
> > > Hi,
> > >
> > > We are using the "custom"-way. But this does not protect you from all
> issues.
> > >
> > > We had problems with a new cpu-generation not (jet) detected correctly
> > > in an libvirt-version. So libvirt failed back to the "desktop"-cpu of
> > > this newer generation, but didnt support/detect some features =>
> > > blocked live-migration.
> > yes that is common when using really new hardware. having previouly
> worked
> > at intel and hitting this often that one of the reason i tend to default
> to host-passthouh
> > and recommend using AZ or aggreate to segreatate the cloud for live
> migration.
> >
> > in the case where your libvirt does not know about the new cpus your
> best approch is to use the
> > newest server cpu model that it know about and then if you really need
> the new fature you can try
> > to add theem using the config options  but that is effectivly the same
> as using host-passhtough
> > which is why i default to that as a workaround instead.
> >
> > >
> > >  Fabian
> > >
> > > Am Di., 18. Aug. 2020 um 16:54 Uhr schrieb Belmiro Moreira
> > > <moreira.belmiro.email.lists at gmail.com>:
> > > >
> > > > Hi,
> > > > in our infrastructure we have always compute nodes that need a
> hardware intervention and as a consequence they are
> > > > rebooted, bringing a new kernel, kvm, ...
> > > >
> > > > In order to have a good compromise between performance and
> flexibility (live migration) we have been using "host-
> > > > model" for the "cpu_mode" configuration of our service VMs. We
> didn't expect to have CPU compatibility issues
> > > > because we have the same hardware type per cell.
> > > >
> > > > The problem is that when a compute node is rebooted the instance
> domain is recreated with the new cpu features that
> > > > were introduced because of the reboot (using centOS).
> > > >
> > > > If there are new CPU features exposed, this basically blocks live
> migration to all the non rebooted compute nodes
> > > > (those cpu features are not exposed, yet). The nova-scheduler
> doesn't know about them when scheduling the live
> > > > migration destination.
> > > >
> > > > I wonder how other operators are solving this issue.
> > > > I don't like stopping OS upgrades.
> > > > What I'm considering is to define a "custom" cpu_mode for each
> hardware type.
> > > >
> > > > I would appreciate your comments and learn how you are solving this
> problem.
> > > >
> > > > Belmiro
> > > >
> > >
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200821/f07ab2ad/attachment.html>


More information about the openstack-discuss mailing list