hi yes this is a know issue. so the simple answer is resize all affected vms instead of live migrating them the longer answer is we have been dissing this internally at redhat on and off for some time now. https://bugs.launchpad.net/nova/+bug/1960840 is one example where this happens. there is another case for the cpu based quotas that happens when going form rhel/centos 8->9 basically in the 8->9 change the cgroups implemantion changes form v1 to v2 https://bugzilla.redhat.com/show_bug.cgi?id=2035518 when adressing that we did not have a good universal solution for instnace that hardcoded a value that was incompatible with the cgroups_v2 api in the kernel except resize. in https://review.opendev.org/c/openstack/nova/+/824048/ we removed automatically adding the cpu_shares cgroup option to enable booting vms with more then 8 cpus we did not come up with any option other then resize for the other quotas that were in a similar situation. the one option that we considerd possibel to do was extend nova-mange to allow the embeded flaour to be updated this would be similar to what we did to enable the image property to be modifed for chaing machine types. https://docs.openstack.org/nova/latest/cli/nova-manage.html#image-property-c... we didcussed at the time that while we did not want to allow falvor extra specs to be modifed we might recondier that if the quota issue forced our hand or we had a similar need due to foces beyond our contol. i.e. we needed to provide a way beyond resize e.g. due ot operating system changes. what make image properties and flavor extra spec different is that image proerties can only be updated by rebuild which is a destructive operation. extra specs are upsted by resize which is not a destructive operation. that is one of the reasons we have special considertion to image properties and did not do the same for extra specs. if we allow the same for flavor extra specs you would still have to stop the instance, make the change and then migrate the instnace resize automates that so it is generall a better fit. we were also conceren that adding it to nova manage would result in it being abused to modify instnace in ways that were either invalid for the host(changing the numa toplogy, adding traits/resouce request not trackedcxd in placemnt) or otherwise break the instnace in weird ways. that could happen via image properites too but its less likely. On Tue, 2023-01-03 at 17:25 +0100, Jahson Babel wrote:
Hello,
I'm trying to live migrate some VMs from CentOS 7 to Rocky 8. Everything run smoothly when there is no extra specs on flavors but things getting more complicated when those are fixed. Especially when using quota:vif_burst for QOS. I know that we aren't supposed to use this for QOS now but it's an old cluster and it was done that way at the time. So VMs kinda have all those specs tied to them.
When live migrate a VM this show up in the nova's logs : driver.py _live_migration_operation nova.virt.libvirt.driverĀ Live Migration failure: internal error: Child process (tc class add dev tapxxxxxxxx-xx parent 1: classid 1:1 htb rate 250000kbps ceil 2000000kbps burst 60000000kb quantum 21333) unexpected exit status 1: Illegal "burst" This bug cover the problem : https://bugs.launchpad.net/nova/+bug/1960840 So it's seems to be a normal behavior. Plus I forgot to mention that I'm on OpenStack Train version and the file mentioned in the launchpad is not present for this version. By using Rocky 8 I have to use an updated libvirt that won't accept the burst parameter we used to set. All available versions of libvirt on Rocky 8 have changed behavior concerning the burst parameter.
I've done some testing to make things works including removing the extra_specs on flavors and in the DB, removing it through libvirt and trying to modify tc rules used by a VM but it didn't worked. I have not tried yet to patch Nova or Libvirt but I don't really know where to look for. The only thing that did work was to resize the VM to an identical flavor without the extra_specs. But this induce a complete reboot of the VM. I would like, if possible, to be able to live migrate the VMs which is quite easier.
Is it possible to remove the extra_specs on the VMs and then live migrate ? Or should I just plan to resize/reboot all VMs without those extra_specs ? Any advise will be appreciated.
Thank you for any help, Best regards.
Jahson