Hi,

Try to choose a custom cpu_model that fits into your infra. This should be the best approach to avoid this kind of problem. If the performance is not an issue for the tenants, KVM64 should be a good election.

Br,

Blockchain, DevOps & Open Source Cloud Solutions Architect

----------------------------------------

Founder & CEO

OpenCloud.es

luis.ramirez@opencloud.es

Skype ID: d.overload

Hangouts: luis.ramirez@opencloud.es

+34 911 950 123 /

+39 392 1289553 /

+49 152 26917722 / Česká republika: +420 774 274 882

-----------------------------------------------------

El mar., 18 ago. 2020 a las 16:55, Belmiro Moreira (<moreira.belmiro.email.lists@gmail.com>) escribió:

Hi,
in our infrastructure we have always compute nodes that need a hardware intervention and as a consequence they are rebooted, bringing a new kernel, kvm, ...

In order to have a good compromise between performance and flexibility (live migration) we have been using "host-model" for the "cpu_mode" configuration of our service VMs. We didn't expect to have CPU compatibility issues because we have the same hardware type per cell.

The problem is that when a compute node is rebooted the instance domain is recreated with the new cpu features that were introduced because of the reboot (using centOS).

If there are new CPU features exposed, this basically blocks live migration to all the non rebooted compute nodes (those cpu features are not exposed, yet). The nova-scheduler doesn't know about them when scheduling the live migration destination.

I wonder how other operators are solving this issue.
I don't like stopping OS upgrades.
What I'm considering is to define a "custom" cpu_mode for each hardware type.

I would appreciate your comments and learn how you are solving this problem.

Belmiro