[stable][oslo] Supporting qemu 4.1.0 on stein and older
Hi, This is related to the FFE for train, but I wanted to discuss it separately because I think the circumstances are a bit different. Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky, which is basically in the same situation as Stein. I should note he's willing to carry the patch downstream if necessary. On the one hand, it sounds like this is something at least one operator wants, but on the other I'm not sure the stable policy supports backporting patches to support a version of a dependency that didn't exist when the release was initially cut. I'm soliciting opinions on how to proceed here. Reference: https://review.opendev.org/#/c/686532 Thanks. -Ben
On 2019-10-07 10:44:04 -0500 (-0500), Ben Nemec wrote: [...]
Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky [...]
Neither the changes nor the bug report indicate what the motivation is for supporting newer Qemu with (much) older OpenStack. Is there some platform which has this Qemu behavior on which folks are trying to run Rocky? Or is it a homegrown build combining these dependency versions from disparate time periods? Or maybe some other reason I'm not imagining? -- Jeremy Stanley
On Mon, 2019-10-07 at 16:31 +0000, Jeremy Stanley wrote:
On 2019-10-07 10:44:04 -0500 (-0500), Ben Nemec wrote: [...]
Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky
[...]
Neither the changes nor the bug report indicate what the motivation is for supporting newer Qemu with (much) older OpenStack. Is there some platform which has this Qemu behavior on which folks are trying to run Rocky? Or is it a homegrown build combining these dependency versions from disparate time periods? Or maybe some other reason I'm not imagining? i suspect the motivation is the fact that distos like RHEL often bump qemu and libvirt versions in minor releases. so if you deploy Queens on say rhel 7.5 orignally but you upgraged it to rhel 7.7 over time you would end up running with a qemu/libvirt that may not have existed when queens was released.
when qemu has broken its public api in the past and that change in behavior has been addressed in later openstack release disto have often had to backport that fix to an openstack that was release before that depency existed. this depends on the distro. canonical for example package qemu and ovs in the ubuntu cloud archive for each given release i belive so you can go form 18.04.0 to 18.04.1 and know it wont break your openstack install but on rhel QEMU and kvm are owned by a sperate team and layered prodcut like openstack consume the output of that team which follow the RHEL release cycle not the openstack one. so i expect this to vary per distro. when a change is backportable upstream that is obviosly perferable. i dont actully think this need to be fixed in Train GA if a oslo release is done promptly that can be consumed instead. i expect this to get backported downs stream anyway so if we can avoid multiple distros doing that and backport it upstream give it backward compatibale it think that would be preferable. just my 2 cents
On 10/7/19 11:31 AM, Jeremy Stanley wrote:
On 2019-10-07 10:44:04 -0500 (-0500), Ben Nemec wrote: [...]
Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky [...]
Neither the changes nor the bug report indicate what the motivation is for supporting newer Qemu with (much) older OpenStack. Is there some platform which has this Qemu behavior on which folks are trying to run Rocky? Or is it a homegrown build combining these dependency versions from disparate time periods? Or maybe some other reason I'm not imagining?
In addition to the downstream reasons Sean mentioned, Mark (the original author of the patch) responded to my question on the train backport with this: """ Today, I need it in Rocky. But, I'm find to do local patching. Anybody who needs Qemu 4.1.0 likely needs it. A key feature in Qemu 4.1.0 is that this is the first release of Qemu to include proper support for migration of L1 guests that have L2 guests (nVMX / nested KVM). So, I expect it is pretty important to whoever realizes this, and whoever needs this. """ So basically a desire to use a feature of the newer qemu with older openstack, which is why I'm questioning whether this fits our stable policy. My inclination is to say it's a fairly simple, backward-compatible patch that will make users' lives easier, but I also feel like doing a backport to enable a feature, even if the actual patch is a "bugfix", is violating the spirit of the stable policy.
On Mon, 2019-10-07 at 14:43 -0500, Ben Nemec wrote:
On 10/7/19 11:31 AM, Jeremy Stanley wrote:
On 2019-10-07 10:44:04 -0500 (-0500), Ben Nemec wrote: [...]
Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky
[...]
Neither the changes nor the bug report indicate what the motivation is for supporting newer Qemu with (much) older OpenStack. Is there some platform which has this Qemu behavior on which folks are trying to run Rocky? Or is it a homegrown build combining these dependency versions from disparate time periods? Or maybe some other reason I'm not imagining?
In addition to the downstream reasons Sean mentioned, Mark (the original author of the patch) responded to my question on the train backport with this:
""" Today, I need it in Rocky. But, I'm find to do local patching.
Anybody who needs Qemu 4.1.0 likely needs it. A key feature in Qemu 4.1.0 is that this is the first release of Qemu to include proper support for migration of L1 guests that have L2 guests (nVMX / nested KVM). So, I expect it is pretty important to whoever realizes this, and whoever needs this. """
So basically a desire to use a feature of the newer qemu with older openstack, which is why I'm questioning whether this fits our stable policy. My inclination is to say it's a fairly simple, backward-compatible patch that will make users' lives easier, but I also feel like doing a backport to enable a feature, even if the actual patch is a "bugfix", is violating the spirit of the stable policy.
in many distros the older qemus allow migration of the l1 guest eventhouhg it is unsafe to do so and either work by luck or the vm will curput its memroy and likely crash. the context of the qemu issue is for years people though that live migration with nested virt worked, then it was disabeld upstream and many distos reverted that as it would break there users where they got lucky and it worked, and in 4.1 it was fixed. this does not add or remvoe any functionality in openstack nova will try to live migarte if you tell it too regardless of the qemu it has it just will fail if the live migration check was complied in. similarly if all your images did not have fractional sizes you could use 4.1.0 with older oslo releases and it would be fine. i.e. you could get lucky and for your specific usecase this might not be needed but it would be nice not do depend on luck. anyway i woudl expect any disto the chooses to support qemu 4.1.0 to backport this as required. im not sure this problematic to require a late oslo version bump before train ga but i would hope it can be fixed on stable/train
On 10/7/19 3:08 PM, Sean Mooney wrote:
On Mon, 2019-10-07 at 14:43 -0500, Ben Nemec wrote:
On 10/7/19 11:31 AM, Jeremy Stanley wrote:
On 2019-10-07 10:44:04 -0500 (-0500), Ben Nemec wrote: [...]
Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky
[...]
Neither the changes nor the bug report indicate what the motivation is for supporting newer Qemu with (much) older OpenStack. Is there some platform which has this Qemu behavior on which folks are trying to run Rocky? Or is it a homegrown build combining these dependency versions from disparate time periods? Or maybe some other reason I'm not imagining?
In addition to the downstream reasons Sean mentioned, Mark (the original author of the patch) responded to my question on the train backport with this:
""" Today, I need it in Rocky. But, I'm find to do local patching.
Anybody who needs Qemu 4.1.0 likely needs it. A key feature in Qemu 4.1.0 is that this is the first release of Qemu to include proper support for migration of L1 guests that have L2 guests (nVMX / nested KVM). So, I expect it is pretty important to whoever realizes this, and whoever needs this. """
So basically a desire to use a feature of the newer qemu with older openstack, which is why I'm questioning whether this fits our stable policy. My inclination is to say it's a fairly simple, backward-compatible patch that will make users' lives easier, but I also feel like doing a backport to enable a feature, even if the actual patch is a "bugfix", is violating the spirit of the stable policy.
in many distros the older qemus allow migration of the l1 guest eventhouhg it is unsafe to do so and either work by luck or the vm will curput its memroy and likely crash. the context of the qemu issue is for years people though that live migration with nested virt worked, then it was disabeld upstream and many distos reverted that as it would break there users where they got lucky and it worked, and in 4.1 it was fixed.
this does not add or remvoe any functionality in openstack nova will try to live migarte if you tell it too regardless of the qemu it has it just will fail if the live migration check was complied in.
similarly if all your images did not have fractional sizes you could use 4.1.0 with older oslo releases and it would be fine. i.e. you could get lucky and for your specific usecase this might not be needed but it would be nice not do depend on luck.
anyway i woudl expect any disto the chooses to support qemu 4.1.0 to backport this as required. im not sure this problematic to require a late oslo version bump before train ga but i would hope it can be fixed on stable/train
Note that this discussion is separate from the train patch. I agree we should do that backport, and actually we already have. That discussion was just about timing of the release. This thread is because the fix was also proposed to stable/stein. It merged before I had a chance to start this discussion, and I'm wondering if we need to revert it.
Okay, circling back to wrap this topic up. It sounds like this is a pretty big win in terms of avoiding random failures either from trying to migrate a VM with nested guests on older qemu or using newer qemu with older OpenStack. Since it's a pretty simple patch and it allows our stable branches to behave more sanely, I'm inclined to go with the backport. If anyone strongly objects, please let me know ASAP before we release it. On 10/7/19 3:36 PM, Ben Nemec wrote:
On 10/7/19 3:08 PM, Sean Mooney wrote:
On Mon, 2019-10-07 at 14:43 -0500, Ben Nemec wrote:
On 10/7/19 11:31 AM, Jeremy Stanley wrote:
On 2019-10-07 10:44:04 -0500 (-0500), Ben Nemec wrote: [...]
Qemu 4.1.0 did not exist during the Stein cycle, so it's not clear to me that backporting bug fixes for it is valid. The original author of the patch actually wants it for Rocky
[...]
Neither the changes nor the bug report indicate what the motivation is for supporting newer Qemu with (much) older OpenStack. Is there some platform which has this Qemu behavior on which folks are trying to run Rocky? Or is it a homegrown build combining these dependency versions from disparate time periods? Or maybe some other reason I'm not imagining?
In addition to the downstream reasons Sean mentioned, Mark (the original author of the patch) responded to my question on the train backport with this:
""" Today, I need it in Rocky. But, I'm find to do local patching.
Anybody who needs Qemu 4.1.0 likely needs it. A key feature in Qemu 4.1.0 is that this is the first release of Qemu to include proper support for migration of L1 guests that have L2 guests (nVMX / nested KVM). So, I expect it is pretty important to whoever realizes this, and whoever needs this. """
So basically a desire to use a feature of the newer qemu with older openstack, which is why I'm questioning whether this fits our stable policy. My inclination is to say it's a fairly simple, backward-compatible patch that will make users' lives easier, but I also feel like doing a backport to enable a feature, even if the actual patch is a "bugfix", is violating the spirit of the stable policy.
in many distros the older qemus allow migration of the l1 guest eventhouhg it is unsafe to do so and either work by luck or the vm will curput its memroy and likely crash. the context of the qemu issue is for years people though that live migration with nested virt worked, then it was disabeld upstream and many distos reverted that as it would break there users where they got lucky and it worked, and in 4.1 it was fixed.
this does not add or remvoe any functionality in openstack nova will try to live migarte if you tell it too regardless of the qemu it has it just will fail if the live migration check was complied in.
similarly if all your images did not have fractional sizes you could use 4.1.0 with older oslo releases and it would be fine. i.e. you could get lucky and for your specific usecase this might not be needed but it would be nice not do depend on luck.
anyway i woudl expect any disto the chooses to support qemu 4.1.0 to backport this as required. im not sure this problematic to require a late oslo version bump before train ga but i would hope it can be fixed on stable/train
Note that this discussion is separate from the train patch. I agree we should do that backport, and actually we already have. That discussion was just about timing of the release.
This thread is because the fix was also proposed to stable/stein. It merged before I had a chance to start this discussion, and I'm wondering if we need to revert it.
On Mon, Oct 14, 2019 at 10:52:56AM -0500, Ben Nemec wrote:
Okay, circling back to wrap this topic up. It sounds like this is a pretty big win in terms of avoiding random failures either from trying to migrate a VM with nested guests on older qemu or using newer qemu with older OpenStack. Since it's a pretty simple patch and it allows our stable branches to behave more sanely, I'm inclined to go with the backport. If anyone strongly objects, please let me know ASAP before we release it.
It's a little strange but from a stable POV it's okay to backport. We had to do something similar after meltdown, as kernel fixes for that introduced 'anti-features we needed to account for that clearly didn't exist when we built the older releases of OpenStack. Yours Tony.
participants (4)
-
Ben Nemec
-
Jeremy Stanley
-
Sean Mooney
-
Tony Breeds