[nova][osc][api-sig] How strict should our clients be?
Hey, We have an interesting problem that I wanted to poll opinions on. In OSC 5.5.0, we closed most of the gaps between novaclient and openstackclient. As part of these changes, we introduced validation of a number of requests such as validating enum-style values. For example, [1][2][3]. This validation already occurs on the server side, but by adding it to the client side we prevent users sending invalid requests to the server in the first place and allow users to discover the correct API behaviour from the client rather than having to read the API docs or use trial and error. Now, an issue has been opened against OSC. Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. They have asked [4][5] that we relax the client-side validation to allow them to continue relying on this bug. As you can probably tell from my comments, this seems to me to be an open and shut case: you shouldn't fork an OpenStack API and you shouldn't side-step validation. However, I wanted to see if anyone disagreed and thought there was merit in loose or no validation of API requests made via our clients. Let me know what you think, Stephen [1] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [2] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [3] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [4] https://storyboard.openstack.org/#!/story/2008975 [5] https://github.com/openstack/python-openstackclient/commit/ab0b1fe885ee0a210...
---- On Tue, 22 Jun 2021 11:39:42 -0500 Stephen Finucane <stephenfin@redhat.com> wrote ----
Hey,
We have an interesting problem that I wanted to poll opinions on. In OSC 5.5.0, we closed most of the gaps between novaclient and openstackclient. As part of these changes, we introduced validation of a number of requests such as validating enum-style values. For example, [1][2][3]. This validation already occurs on the server side, but by adding it to the client side we prevent users sending invalid requests to the server in the first place and allow users to discover the correct API behaviour from the client rather than having to read the API docs or use trial and error.
I think this is the one of benefits of having Client so that we can improve the UX where user will get a clear way of right usage of our API instead of debugging the API code/error and correct the request. Protecting APIs from incorrect usage with right validation is good thing to do in Client.
Now, an issue has been opened against OSC. Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. They have asked [4][5] that we relax the client-side validation to allow them to continue relying on this bug. As you can probably tell from my comments, this seems to me to be an open and shut case: you shouldn't fork an OpenStack API and you shouldn't side-step validation. However, I wanted to see if anyone disagreed and thought there was merit in loose or no validation of API requests made via our clients.
Although its modified API case but Nova bug but in case of Nova bug also raise a very good point. If Client is having such validation and protect users for such kind of API/server side bug then it help us to fix the bug without any user impact. Having more kind of such validation are even better for UX perspective. and the modified APIs case (which is the case of story/2008975) is something we want people to avoid and encourage to integrate the changes in upstream as per eligibity. That was whole point of removing the API extensions concept from Nova. IMO, this is right change in Client side and improve the overall UX. -gmann
Let me know what you think, Stephen
[1] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [2] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [3] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [4] https://storyboard.openstack.org/#!/story/2008975 [5] https://github.com/openstack/python-openstackclient/commit/ab0b1fe885ee0a210...
On 2021-06-22 17:39:42 +0100 (+0100), Stephen Finucane wrote: [...]
Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. [...]
I can't find where they explained what new policy they've implemented in their fork. Perhaps if they elaborated on the use case, it could be it's something the Nova maintainers would accept a patch to officially extend the API to incorporate, allowing that deployment to un-fork? -- Jeremy Stanley
On Tue, 2021-06-22 at 17:17 +0000, Jeremy Stanley wrote:
On 2021-06-22 17:39:42 +0100 (+0100), Stephen Finucane wrote: [...]
Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. [...]
I can't find where they explained what new policy they've implemented in their fork. Perhaps if they elaborated on the use case, it could be it's something the Nova maintainers would accept a patch to officially extend the API to incorporate, allowing that deployment to un-fork? my understandign is that they are trying to model fault domains an have a fault domain aware anti affintiy policy that use host-aggreate or azs to model the fault to doamin.
they reasched out to us downstream too about this and all i know so fart is they are implemetneign there own filter to do this which is valid. what is not valid ti extending a seperate api in this case the server group api to then use as an input to the out of tree filter. if they had use a schduler hint which inteionally support out of tree hints or a flaovr extra spec then it would be fine. the use fo a custom server group policy whne the server groups is not a defiend public extion point is the soucce of the confilct. the use case of an host aggrate anti affinti plicy while likely not efficent to implement is at leaset a somewhat resonable one that i could see supporting upstream. although there are many edgcases with regard to host being in mutliple host aggreates. if they are doing this based on avaiablity zone that is simler since a hsot can only be in one az. in anycase it woudl be nice if they brought there usecase upstream or even downstream so we could find a more supprotable way to enable it.
Hi folks! As a guy who was top 1 contributor to novaclient at some point, I tried to extend a validation at the client-side as much as possible. I mean, I really like the approach when the user sees a validation error in an ms (or a second depending on the system and plugins) without passing the auth and sending any request to API, so big +1 to leave enum of possible choices there. BUT I have one concern here: does it possible that the number of official policies will be extended or it becomes pluggable(without patching of nova code itself)? In this case, it would be nice to be a bit less strict. вт, 22 июн. 2021 г. в 20:51, Sean Mooney <smooney@redhat.com>:
On Tue, 2021-06-22 at 17:17 +0000, Jeremy Stanley wrote:
On 2021-06-22 17:39:42 +0100 (+0100), Stephen Finucane wrote: [...]
Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. [...]
I can't find where they explained what new policy they've implemented in their fork. Perhaps if they elaborated on the use case, it could be it's something the Nova maintainers would accept a patch to officially extend the API to incorporate, allowing that deployment to un-fork? my understandign is that they are trying to model fault domains an have a fault domain aware anti affintiy policy that use host-aggreate or azs to model the fault to doamin.
they reasched out to us downstream too about this and all i know so fart is they are implemetneign there own filter to do this which is valid. what is not valid ti extending a seperate api in this case the server group api to then use as an input to the out of tree filter.
if they had use a schduler hint which inteionally support out of tree hints or a flaovr extra spec then it would be fine. the use fo a custom server group policy whne the server groups is not a defiend public extion point is the soucce of the confilct.
the use case of an host aggrate anti affinti plicy while likely not efficent to implement is at leaset a somewhat resonable one that i could see supporting upstream. although there are many edgcases with regard to host being in mutliple host aggreates. if they are doing this based on avaiablity zone that is simler since a hsot can only be in one az.
in anycase it woudl be nice if they brought there usecase upstream or even downstream so we could find a more supprotable way to enable it.
-- Best regards, Andrey Kurilin.
On Tue, 2021-06-22 at 23:49 +0300, Andrey Kurilin wrote:
Hi folks!
As a guy who was top 1 contributor to novaclient at some point, I tried to extend a validation at the client-side as much as possible. I mean, I really like the approach when the user sees a validation error in an ms (or a second depending on the system and plugins) without passing the auth and sending any request to API, so big +1 to leave enum of possible choices there. i agree with keeping the validation in the clinet by the way. more details on the usecase tehy had below.
BUT I have one concern here: does it possible that the number of official policies will be extended or it becomes pluggable(without patching of nova code itself)? no that is not possible, any extenions of the policies woudl requrie a new micro version. we have previously added new microversion when we added the soft affintiy policy. as with any new microversion we woudl naturally also extend the clinet to support that new microverison.
nova has not supported api extnension for a very long time and i dont forsee us making this plugable or reintoducing api extentions in the sort to near term. i receved more infomation fomr out downstream support engienrs on what the customer is actlly trying to do. our supprot engeienr discorverd this old spec form jay to add aggrate affintiy policies. https://review.opendev.org/c/openstack/nova-specs/+/529135/6/specs/rocky/app... the costomer usecase is rack level affinity/antiafinity to ensure that vms are schduled to differnt top of rack switchs. they model thos tor failure domains as host aggreates and implented a custome filter to implemant that affintiy and anit affinity. they also however modifid the server side vlaidation brackin micorverion compatiablty by intoducing new tor-affintiy policies. skiming the spec it is mainly focused on ironic but im really not sure why we have nto added an agrrate level affinity/antiaffinty policy already. it has application for non ironic host too and woudl provide a way to do geneirc fault domain modeling. granted it woudl eb nice to model this in palcment somehow but supporting it at all would be valueable. we have added affinity rules for anti-affinity in the form of max_server_per_host https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/com... to me sepcifying a failrue domain in the form of an affinity_scope e.g. affinity_scope=host or affintiy_scope=aggreate or max_servers_per_aggreate could be another alternivie to a new policy but the over all the usecase seams valid regardless of how we adress it. doing this as an nova fork/api exteion however is not really a valid reason to remove validation form teh client. the could also patch the client with an osc plugin or create a fork of it also presumable. they woud just have to monkey patch the existing server group command or reimplement it and override the in tree one.
In this case, it would be nice to be a bit less strict.
вт, 22 июн. 2021 г. в 20:51, Sean Mooney <smooney@redhat.com>:
On Tue, 2021-06-22 at 17:17 +0000, Jeremy Stanley wrote:
On 2021-06-22 17:39:42 +0100 (+0100), Stephen Finucane wrote: [...]
Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. [...]
I can't find where they explained what new policy they've implemented in their fork. Perhaps if they elaborated on the use case, it could be it's something the Nova maintainers would accept a patch to officially extend the API to incorporate, allowing that deployment to un-fork? my understandign is that they are trying to model fault domains an have a fault domain aware anti affintiy policy that use host-aggreate or azs to model the fault to doamin.
they reasched out to us downstream too about this and all i know so fart is they are implemetneign there own filter to do this which is valid. what is not valid ti extending a seperate api in this case the server group api to then use as an input to the out of tree filter.
if they had use a schduler hint which inteionally support out of tree hints or a flaovr extra spec then it would be fine. the use fo a custom server group policy whne the server groups is not a defiend public extion point is the soucce of the confilct.
the use case of an host aggrate anti affinti plicy while likely not efficent to implement is at leaset a somewhat resonable one that i could see supporting upstream. although there are many edgcases with regard to host being in mutliple host aggreates. if they are doing this based on avaiablity zone that is simler since a hsot can only be in one az.
in anycase it woudl be nice if they brought there usecase upstream or even downstream so we could find a more supprotable way to enable it.
On Tue, 2021-06-22 at 23:49 +0300, Andrey Kurilin wrote:
Hi folks!
As a guy who was top 1 contributor to novaclient at some point, I tried to extend a validation at the client-side as much as possible. I mean, I really like the approach when the user sees a validation error in an ms (or a second depending on the system and plugins) without passing the auth and sending any request to API, so big +1 to leave enum of possible choices there.
Cool, that's my thinking also.
BUT I have one concern here: does it possible that the number of official policies will be extended or it becomes pluggable(without patching of nova code itself)? In this case, it would be nice to be a bit less strict.
As Sean has said elsewhere, there's no way to extend this without a microversion. I think it's fair to request that users upgrade their client if they wish to support newer microversions. Stephen
вт, 22 июн. 2021 г. в 20:51, Sean Mooney <smooney@redhat.com>:
On Tue, 2021-06-22 at 17:17 +0000, Jeremy Stanley wrote:
On 2021-06-22 17:39:42 +0100 (+0100), Stephen Finucane wrote: [...]
Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. [...]
I can't find where they explained what new policy they've implemented in their fork. Perhaps if they elaborated on the use case, it could be it's something the Nova maintainers would accept a patch to officially extend the API to incorporate, allowing that deployment to un-fork? my understandign is that they are trying to model fault domains an have a fault domain aware anti affintiy policy that use host-aggreate or azs to model the fault to doamin.
they reasched out to us downstream too about this and all i know so fart is they are implemetneign there own filter to do this which is valid. what is not valid ti extending a seperate api in this case the server group api to then use as an input to the out of tree filter.
if they had use a schduler hint which inteionally support out of tree hints or a flaovr extra spec then it would be fine. the use fo a custom server group policy whne the server groups is not a defiend public extion point is the soucce of the confilct.
the use case of an host aggrate anti affinti plicy while likely not efficent to implement is at leaset a somewhat resonable one that i could see supporting upstream. although there are many edgcases with regard to host being in mutliple host aggreates. if they are doing this based on avaiablity zone that is simler since a hsot can only be in one az.
in anycase it woudl be nice if they brought there usecase upstream or even downstream so we could find a more supprotable way to enable it.
Hi, I take a different view, possibly because I am in a similar position as the requestor. I also work on a openstack installation, which we need to patch to our needs. We try to do everything upstream first, but chances are, there will be changes which are not upstreamable. We also have large user-base, and it is a great advantage to be able to point people to the official client, even if the server is not the official one. A strict client policy would require us to fork the client as well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial. In my point-of-view, such a decision would tightly couple the client to the server for a limited benefit (a fraction of seconds earlier error message). As a compromise, I would suggest to make the client validation configurable as in kubectl with --validate=true. Cheers, Fabian
On Wed, 2021-06-23 at 10:21 +0000, Wiesel, Fabian wrote:
Hi,
I take a different view, possibly because I am in a similar position as the requestor. I also work on a openstack installation, which we need to patch to our needs. We try to do everything upstream first, but chances are, there will be changes which are not upstreamable.
We also have large user-base, and it is a great advantage to be able to point people to the official client, even if the server is not the official one. A strict client policy would require us to fork the client as well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial. In my point-of-view, such a decision would tightly couple the client to the server for a limited benefit (a fraction of seconds earlier error message).
As a compromise, I would suggest to make the client validation configurable as in kubectl with --validate=true.
kubernets has a very differnet approch to api stablity and extensiblity they have versioned extions and support mutlipe versions fo the same extension over tiem. they alsow have a purly plugable api where you can define new contoler to impelent new behavior allowing any depleyment ot have a complete different set of requests and featucre they develop loacl to be integrated into kubernetes which posses problems for interoperatblity between differnt k8s instalations. if we were to add a new gobal option for this we would have to also ensure it default to validating by default. what i think might be a better UX would be for operator to not ship a forked clinet persay but to ship a plugin to the client that also adds your extensions. my other concern with allowing validation to be disabled is that we likely depend on it in part of the code to ensure code is not run unless it passses the validation. it woudl be ineffiecnt to have code to chekc for our precondition to call a function in addtion to the validation so user might get tracebacks or orhter unfriendly errors if they disabled validation. the client validation we have today i belive only enforce enum for example where the value has a fixed set of values. if the field in the api is an unbounded string then the client woudl not perfrom validation of the value of the argument although if we knwo that that argument is only valid if other flags are set then we might check for those. for example if the argument rquires a minium microversion to be used we may not check the value fo the opaqu string filed but woudl validate the microverion range. if you enxtedn the supported feature set in your installation and want to enable the standrd client to work with that you can simply extend the allowed set with a plugin.
Cheers, Fabian
Hi
On 23. Jun 2021, at 12:21, Wiesel, Fabian <fabian.wiesel@sap.com> wrote:
Hi,
I take a different view, possibly because I am in a similar position as the requestor. I also work on a openstack installation, which we need to patch to our needs. We try to do everything upstream first, but chances are, there will be changes which are not upstreamable.
We also have large user-base, and it is a great advantage to be able to point people to the official client, even if the server is not the official one. A strict client policy would require us to fork the client as well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial. In my point-of-view, such a decision would tightly couple the client to the server for a limited benefit (a fraction of seconds earlier error message).
You touch a very interesting and slippy pot. I belong also to this unlucky category and need to say: - once forked amount of differences only grows - differences in the beginning may be coverable by compromises, but most likely at some point you will reach dead end and need to consider alternative solutions - delivery of the client (here especially talking about OSC) is not that complex as you think - we have a project that adds plugins into OSC and in some cases overrides native behaviour of it. Delivery is as easy as “pip install openstackclient MY_FOKED_CLOUD_PLUGINS_PROJECT”. It is not that different from initially doing “pip install openstackclient" - fraction of seconds there, fraction in another place and suddenly users are crying: why the heck this tool is so slow (here I mean another side of the coin where simply forcing users to retry invocation with corrected set of parameters with +4s for initialisations, +1s on laggy network, + some more retries with further problems, + cleaning up after really failed attempts, etc are making users mad) - I am personally 100% belonging to "fail early” group. It just take much more efforts explaining to the user what this bloody server response without any message in it means (we all know the sad reality of usefulness of some of the responses).
As a compromise, I would suggest to make the client validation configurable as in kubectl with --validate=true.
Sounds really like a reasonable compromise (but I would reverse the flag to allow skipping - I hate possibility to create broken resources), but as I mentioned earlier - sooner or later you will start paying for the fork. So start doing things proper from the beginning. Regards, Artem
Hi, On 23/6/21, 13:04, "Artem Goncharov" <artem.goncharov@gmail.com> wrote:
- we have a project that adds plugins into OSC and in some cases overrides native behaviour of it. Delivery is as easy as “pip install openstackclient MY_FOKED_CLOUD_PLUGINS_PROJECT”. It is not that different from initially doing “pip install openstackclient" - we have a project that adds plugins into OSC and in some cases overrides native behaviour of it. Delivery is as easy as “pip install openstackclient MY_FOKED_CLOUD_PLUGINS_PROJECT”. It is not that different from initially doing “pip install openstackclient"
How do you manage then the rest of the life-cycle of the client software? And other languages?
- fraction of seconds there, fraction in another place and suddenly users are crying: why the heck this tool is so slow (here I mean another side of the coin where simply forcing users to retry invocation with corrected set of parameters with +4s for initialisations, +1s on laggy network, + some more retries with further problems, + cleaning up after really failed attempts, etc are making users mad)
I agree that responsiveness is good, but I think, the proposed client validation won't make much of a dent there.
- I am personally 100% belonging to "fail early” group. It just take much more efforts explaining to the user what this bloody server response without any message in it means (we all know the sad reality of usefulness of some of the responses).
I think that points to more problems with the client-side approach, and is for me another argument to do it server-side: Doing the validation in the OSC means that other clients (java, go, etc...) are not benefitting from the work. Server-side, I can roll out an improved error message as fast as my deployment pipeline allows to all users and all clients. Which adds another point: The more logic you have in the client, the more likely they are going deviate from the server. Another source of bugs. And what about the error messages themselves? How do we ensure that they are consistent across the whole user-base? If they are client side, they differ from version to version, and language to language.
As a compromise, I would suggest to make the client validation configurable as in kubectl with --validate=true.
Sounds really like a reasonable compromise (but I would reverse the flag to allow skipping - I hate possibility to create broken resources), but as I mentioned earlier - sooner or later you will start paying for the fork. So start doing things proper from the beginning.
I agree, if going for client-side validation, would go with the validation being on by default. Cheers, Fabian
---- On Wed, 23 Jun 2021 05:21:58 -0500 Wiesel, Fabian <fabian.wiesel@sap.com> wrote ----
Hi,
I take a different view, possibly because I am in a similar position as the requestor. I also work on a openstack installation, which we need to patch to our needs. We try to do everything upstream first, but chances are, there will be changes which are not upstreamable.
We also have large user-base, and it is a great advantage to be able to point people to the official client, even if the server is not the official one. A strict client policy would require us to fork the client as well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial. In my point-of-view, such a decision would tightly couple the client to the server for a limited benefit (a fraction of seconds earlier error message).
What are the exact reason for not upstreaming the changes? We have microversion mechanish in Nova API to improve/change the API in backward compatible and discoverable way. That will be helpful to add the more API/changing existing APIs without impacting the existing user of that API. -gmann
As a compromise, I would suggest to make the client validation configurable as in kubectl with --validate=true.
Cheers, Fabian
On 23/6/21, 18:03, "Ghanshyam Mann" <gmann@ghanshyammann.com> wrote: ---- On Wed, 23 Jun 2021 05:21:58 -0500 Wiesel, Fabian <fabian.wiesel@sap.com> wrote ---- > I take a different view, possibly because I am in a similar position as the requestor. > I also work on a openstack installation, which we need to patch to our needs. > We try to do everything upstream first, but chances are, there will be changes which are not upstreamable. > > We also have large user-base, and it is a great advantage to be able to point people to the official client, even if the server is not the official one. > A strict client policy would require us to fork the client as well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial. > In my point-of-view, such a decision would tightly couple the client to the server for a limited benefit (a fraction of seconds earlier error message). What are the exact reason for not upstreaming the changes? We have microversion mechanish in Nova API to improve/change the API in backward compatible and discoverable way. That will be helpful to add the more API/changing existing APIs without impacting the existing user of that API. Currently, we do not have any API changes and our team inside SAP is pushing back against custom changes in the API from our user-base. Any API change we plan to do, we try to get consensus with upstream first. But chances are, that there are requests within our company we must fulfill (even if our team itself may disagree) within a certain timeline, and I do not expect that the community will comply with either the timeline or the request itself. The changes we do not try to upstream are simply things we consider workarounds for our special situation: We are reaching the supported limits of our vendor (VMware), and we are trying to get our vendor to fix those. Cheers, Fabian
---- On Thu, 24 Jun 2021 07:22:09 -0500 Wiesel, Fabian <fabian.wiesel@sap.com> wrote ----
On 23/6/21, 18:03, "Ghanshyam Mann" <gmann@ghanshyammann.com> wrote:
---- On Wed, 23 Jun 2021 05:21:58 -0500 Wiesel, Fabian <fabian.wiesel@sap.com> wrote ---- > I take a different view, possibly because I am in a similar position as the requestor. > I also work on a openstack installation, which we need to patch to our needs. > We try to do everything upstream first, but chances are, there will be changes which are not upstreamable. > > We also have large user-base, and it is a great advantage to be able to point people to the official client, even if the server is not the official one. > A strict client policy would require us to fork the client as well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial. > In my point-of-view, such a decision would tightly couple the client to the server for a limited benefit (a fraction of seconds earlier error message).
What are the exact reason for not upstreaming the changes? We have microversion mechanish in Nova API to improve/change the API in backward compatible and discoverable way. That will be helpful to add the more API/changing existing APIs without impacting the existing user of that API.
Currently, we do not have any API changes and our team inside SAP is pushing back against custom changes in the API from our user-base. Any API change we plan to do, we try to get consensus with upstream first.
But chances are, that there are requests within our company we must fulfill (even if our team itself may disagree) within a certain timeline, and I do not expect that the community will comply with either the timeline or the request itself.
Thanks Fabian for explaining in detail. I understand the situation. In Nova, if you have API change request, we do follow the design discussion in specs repo first and then implementation should not take much time (depends on author activeness on updating review comment or so). All this is possible to merger in one cycle itself but to make it available at customer side depends on how soon you upgrade to that release. But I feel this is a general issue on long release cycle not just API or Client. In that case, how about providing a config option to disable the client side strict validation (by default we can keep the validation) ? Doing that in API side is not good but at least the client can be flexible. May be osc team can provide their opinion? -gmann
The changes we do not try to upstream are simply things we consider workarounds for our special situation: We are reaching the supported limits of our vendor (VMware), and we are trying to get our vendor to fix those.
Cheers, Fabian
Hey everyone, Maybe I can add some context here. The initial question comes from an initiative to create a custom constraint/affinity solution to address a scheduling issue we've seen from time to time. - Computes are dual-homed to two different TOR (top-of-rack) switches. - Two TORs (a tor pair) will connect up to X amount of computes. - A user deploys 10 Virtual Machines with anti-affinity enabled. - Due to the default scheduler filters (RAM, CPU) and stacking/spreading allocations, Openstack places all 10 VMs on the same TOR pair but on different computes to respect the anti-affinity constraint. - The TOR pair becomes a SPOF for the entire deployment. This was addressed by creating a new anti-affinity filter, overriding part of the default Nova code. A "tor-anti-affinity" filter can now be attached to a server-group. There might be better ways to implement that kind of filtering based on infrastructure external to Openstack (AZ, host-aggregates, bettter spreading of instances to prevent compute stacking) but it's the way it was initially implemented. Let me know if you have any questions! Thanks! On Thu, Jun 24, 2021 at 12:01 PM Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
---- On Thu, 24 Jun 2021 07:22:09 -0500 Wiesel, Fabian < fabian.wiesel@sap.com> wrote ----
On 23/6/21, 18:03, "Ghanshyam Mann" <gmann@ghanshyammann.com> wrote:
---- On Wed, 23 Jun 2021 05:21:58 -0500 Wiesel, Fabian <
fabian.wiesel@sap.com> wrote ----
> I take a different view, possibly because I am in a similar
position as the requestor.
> I also work on a openstack installation, which we need to patch
to our needs.
> We try to do everything upstream first, but chances are, there
will be changes which are not upstreamable.
> > We also have large user-base, and it is a great advantage to be
able to point people to the official client, even if the server is not the official one.
> A strict client policy would require us to fork the client as
well, and distribute that to our user-base. With a couple of thousand users, that is not so trivial.
> In my point-of-view, such a decision would tightly couple the
client to the server for a limited benefit (a fraction of seconds earlier error message).
What are the exact reason for not upstreaming the changes? We have
microversion mechanish in Nova API to improve/change the API in
backward compatible and discoverable way. That will be helpful to
add the more API/changing existing APIs without impacting the existing
user of that API.
Currently, we do not have any API changes and our team inside SAP is
pushing back against custom changes in the API from our user-base.
Any API change we plan to do, we try to get consensus with upstream first.
But chances are, that there are requests within our company we must fulfill (even if our team itself may disagree) within a certain timeline, and I do not expect that the community will comply with either the timeline or the request itself.
Thanks Fabian for explaining in detail. I understand the situation.
In Nova, if you have API change request, we do follow the design discussion in specs repo first and then implementation should not take much time (depends on author activeness on updating review comment or so). All this is possible to merger in one cycle itself but to make it available at customer side depends on how soon you upgrade to that release.
But I feel this is a general issue on long release cycle not just API or Client.
In that case, how about providing a config option to disable the client side strict validation (by default we can keep the validation) ? Doing that in API side is not good but at least the client can be flexible. May be osc team can provide their opinion?
-gmann
The changes we do not try to upstream are simply things we consider
workarounds for our special situation: We are reaching the supported limits of our vendor (VMware), and we are trying to get our vendor to fix those.
Cheers, Fabian
---- On Wed, 23 Jun 2021 04:16:21 -0500 Stephen Finucane <stephenfin@redhat.com> wrote ----
On Tue, 2021-06-22 at 23:49 +0300, Andrey Kurilin wrote:Hi folks! As a guy who was top 1 contributor to novaclient at some point, I tried to extend a validation at the client-side as much as possible.I mean, I really like the approach when the user sees a validation error in an ms (or a second depending on the system and plugins) without passing the auth and sending any request to API, so big +1 to leave enum of possible choices there.
Cool, that's my thinking also. BUT I have one concern here: does it possible that the number of official policies will be extended or it becomes pluggable(without patching of nova code itself)? In this case, it would be nice to be a bit less strict. As Sean has said elsewhere, there's no way to extend this without a microversion. I think it's fair to request that users upgrade their client if they wish to support newer microversions.
Yes, plugin-able or extension mechanisms had many other problems in term of interoperability or so. I think microversion is good way to introduce the new API changes without breaking existing users.
Stephen
вт, 22 июн. 2021 г. в 20:51, Sean Mooney <smooney@redhat.com>: On Tue, 2021-06-22 at 17:17 +0000, Jeremy Stanley wrote:
On 2021-06-22 17:39:42 +0100 (+0100), Stephen Finucane wrote: [...]
Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. [...]
I can't find where they explained what new policy they've implemented in their fork. Perhaps if they elaborated on the use case, it could be it's something the Nova maintainers would accept a patch to officially extend the API to incorporate, allowing that deployment to un-fork? my understandign is that they are trying to model fault domains an have a fault domain aware anti affintiy policy that use host-aggreate or azs to model the fault to doamin.
they reasched out to us downstream too about this and all i know so fart is they are implemetneign there own filter to do this which is valid. what is not valid ti extending a seperate api in this case the server group api to then use as an input to the out of tree filter.
if they had use a schduler hint which inteionally support out of tree hints or a flaovr extra spec then it would be fine. the use fo a custom server group policy whne the server groups is not a defiend public extion point is the soucce of the confilct.
the use case of an host aggrate anti affinti plicy while likely not efficent to implement is at leaset a somewhat resonable one that i could see supporting upstream. although there are many edgcases with regard to host being in mutliple host aggreates. if they are doing this based on avaiablity zone that is simler since a hsot can only be in one az.
in anycase it woudl be nice if they brought there usecase upstream or even downstream so we could find a more supprotable way to enable it.
On Tue, Jun 22, 2021 at 5:43 PM Stephen Finucane <stephenfin@redhat.com> wrote:
Hey,
We have an interesting problem that I wanted to poll opinions on. In OSC 5.5.0, we closed most of the gaps between novaclient and openstackclient. As part of these changes, we introduced validation of a number of requests such as validating enum-style values. For example, [1][2][3]. This validation already occurs on the server side, but by adding it to the client side we prevent users sending invalid requests to the server in the first place and allow users to discover the correct API behaviour from the client rather than having to read the API docs or use trial and error.
Now, an issue has been opened against OSC. Apparently someone has been relying on a bug in Nova to pass a different value to the API that what the schema should have allowed, and they are dismayed that the client no longer allows them to do this. They have asked [4][5] that we relax the client-side validation to allow them to continue relying on this bug. As you can probably tell from my comments, this seems to me to be an open and shut case: you shouldn't fork an OpenStack API and you shouldn't side-step validation. However, I wanted to see if anyone disagreed and thought there was merit in loose or no validation of API requests made via our clients.
Let me know what you think, Stephen
[1] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [2] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [3] https://github.com/openstack/python-openstackclient/blob/5.5.0/openstackclie... [4] https://storyboard.openstack.org/#!/story/2008975 [5] https://github.com/openstack/python-openstackclient/commit/ab0b1fe885ee0a210...
Hi all,
My quick two cents in perspective of what we have been doing in Glance for multiple years already. Fail as early as possible. We do have checks on the API layer already way before we hit the code that would fail to recognize patterns we know would fail later on. We do extend this to the client as well. Specially as glanceclient may send multiple requests to the API for single user command we try to identify possible issues in advance. Good example of this is during image creation. If a user makes clent call that would result an active image but is missing, say either of the disk or container formats, we do know that activating said image would fail and we fail it to the user already on the client before sending a single request to the API. Makes it fast, we do not create image resources that would not get used in the case the user just reruns the same command with missing information and everyone wins. We have been advocates of extending our "Fail early" attitude to the client for a very long time and I think it's a good practise. - Erno "jokke" Kuvaja
participants (9)
-
Andrey Kurilin
-
Artem Goncharov
-
Erno Kuvaja
-
Ghanshyam Mann
-
Jeremy Stanley
-
Laurent Dumont
-
Sean Mooney
-
Stephen Finucane
-
Wiesel, Fabian