On Thu, Dec 20, 2018 at 7:14 PM, Matt Riedemann <mriedemos@gmail.com> wrote:
On 12/20/2018 8:47 AM, Jay Pipes wrote:
The problem with feature flags is they are not discoverable. As much as I hate API extensions, at least they are discoverable.
100% agree. Config-driven API behavior like this flies in the face of any kind of hope for interoperability of this feature. Can I do this on cloud X? Maybe? Can I do it on cloud Y? Let's find out! At the very least, I would think if you're going to add a config option to enable/disable this support in the compute API, at least temporarily, you would fail any requests which include ports with QoS resource requests because we're not going to honor them. That aligns more with what I was saying about how I thought the "reject" patches would come earlier in the series since we're currently not supporting this feature, and lying to users by allowing them to attach these kinds of ports. Although I might be conflating QoS ports in general and QoS ports that have the bandwidth resource request things on them.
During the spec review, we waffled a bit on whether or not this should have a new microversion with it [1][2]. The way this works definitely sounds like how volume multiattach is setup - you create a resource in another service with a special characteristic and then try to attach that thing to a server. Nova can either support that thing or not, and the only way for a user to find that out without a microversion is to try it and find out (does it fail? if it doesn't fail, is the thing working in my guest? if not, is it because it's not supported or because something failed or there is a bug?). With volume multiattach we added a microversion so that users could know if they can even make the request to attach those types of volumes so they didn't have to guess. So I could argue that QoS ports like this should get similar treatment, but things get fuzzy when it's implicit behavior on some external resource
Is there a reason why we can't just put code into Nova that, upon seeing the resource requests, acts on it? Is there a reason it needs to be behind a feature flag or even needs to query Neutron for extension support? I mean, if it's in the response payload, just use it, right?
The main reason is how the code is written backward, as I've said elsewhere in this thread. If everything was properly plumbed in from the bottom up for at least creating servers with these kinds of ports (and cleaning them up properly), then it's just a matter of doing like you said - take action on the thing if it's there. Like [3] I think we could explicitly fail for things that are not supported until they do become supported, like any move operations. Whether those are considered bug fixes or new features (a la microversions) is debatable. It sucks adding stuff that works when creating a server but isn't supported when you perform actions on that server, and it sucks holding up features before everything is ready, and it sucks to need microversions for every little thing that ideally would have just be done up front, so I can't really say here on what we should do. I know in the golden days before microversions it was totally kosher to just feature flag our way to nirvana and we're still dealing with a lot of that fallout (like the fact that live migration with NUMA-affined instances still doesn't work properly).
I'd like to make this feature work and merged without drowning into the huge patch series I'm already pushing forward. I'd like to make small and verifyable steps toward the full feature, so for me, moving the functional tests to the end is scary like hell. I'm wondering that introducing an API microversion could act like a feature flag I need and at the same time still make the feautre discoverable as you would like to see it. Something like: Create a feature flag in the code but do not put it in the config as a settable flag. Instead add an API microversion patch to the top of the series and when the new version is requested it enables the feature via the feature flag. This API patch can be small and simple enough to cherry-pick to earlier into the series for local end-to-end testing if needed. Also in functional test I can set the flag via a mock so I can add and run functional tests patch by patch. Regarding rejections. I can add patches that rejects server operations (including create too) if the new microversion isn't requested but a port included in the request or associated to the server has resource request. How does this sound to you? Cheers, gibi [1] https://review.openstack.org/#/c/569459 [2] https://review.openstack.org/#/c/623543 [3] https://review.openstack.org/#/c/567268/42/nova/conf/workarounds.py
[1] https://review.openstack.org/#/c/502306/17/specs/rocky/approved/bandwidth-re... [2] https://review.openstack.org/#/c/502306/26/specs/rocky/approved/bandwidth-re... [3] https://review.openstack.org/#/c/611088/
--
Thanks,
Matt