[tc][all] Project deletion community goal for Train cycle
Hello OpenStackers! As discussed at the Berlin Summit, one of the proposed community goals was project deletion and resource clean-up. Essentially the problem here is that for almost any company that is running OpenStack we run into the issue of how to delete a project and all the resources associated with that project. What we need is an OpenStack wide solution that every project supports which allows operators of OpenStack to delete everything related to a given project. Before we can choose this as a goal, we need to define what the actual proposed solution is, and what each service is either implementing or contributing to. I've started an Etherpad here: https://etherpad.openstack.org/p/community-goal-project-deletion Please add to it if I've missed anything about the problem description, or to flesh out the proposed solutions, but try to mostly keep any discussion here on the mailing list, so that the Etherpad can hopefully be more of a summary of where the discussions have led. This is mostly a starting point, and I expect there to be a lot of opinions and probably some push back from doing anything too big. That said, this is a major issue in OpenStack, and something we really do need because OpenStack is too big and too complicated for this not to exist in a smart cross-project manner. Let's solve this the best we can! Cheers, Adrian Turjak
Hi, Thanks a lot for pushing this Adrian and that etherpad is a really good start! I'm happy to help out champion this if that is of any use and if it's chosen as one of the community goals! Cheers Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED On 2019-01-11 07:18, Adrian Turjak wrote:
Hello OpenStackers!
As discussed at the Berlin Summit, one of the proposed community goals was project deletion and resource clean-up.
Essentially the problem here is that for almost any company that is running OpenStack we run into the issue of how to delete a project and all the resources associated with that project. What we need is an OpenStack wide solution that every project supports which allows operators of OpenStack to delete everything related to a given project.
Before we can choose this as a goal, we need to define what the actual proposed solution is, and what each service is either implementing or contributing to.
I've started an Etherpad here: https://etherpad.openstack.org/p/community-goal-project-deletion
Please add to it if I've missed anything about the problem description, or to flesh out the proposed solutions, but try to mostly keep any discussion here on the mailing list, so that the Etherpad can hopefully be more of a summary of where the discussions have led.
This is mostly a starting point, and I expect there to be a lot of opinions and probably some push back from doing anything too big. That said, this is a major issue in OpenStack, and something we really do need because OpenStack is too big and too complicated for this not to exist in a smart cross-project manner.
Let's solve this the best we can!
Cheers,
Adrian Turjak
On Thu, 2019-01-17 at 15:40 +0100, Tobias Rydberg wrote:
Hi,
Thanks a lot for pushing this Adrian and that etherpad is a really good start!
I'm happy to help out champion this if that is of any use and if it's chosen as one of the community goals!
Cheers
Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg
www.citynetwork.eu | www.citycloud.com
INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED
Hello, It's a pleasure to see people so thrilled about a community goal :) I see the etherpad [0] explains an opinion on how to do things, but I don't see many answers there, so I am not sure if there is a consensus yet. I think it would be great to have a larger community feedback, or at least a API SIG feedback, analysing this pattern. Regards, Jean-Philippe Evrard [0]: https://etherpad.openstack.org/p/community-goal-project-deletion
On Jan 21, 2019, at 3:10 AM, Jean-Philippe Evrard <jean-philippe@evrard.me> wrote:
I think it would be great to have a larger community feedback, or at least a API SIG feedback, analysing this pattern.
I would strongly prefer the approach of each service implementing an endpoint to be called by the Keystone when a project is deleted. Relying on a library that would somehow be able to understand all the parts a project touches within a service sounds a lot more error-prone. -- Ed Leafe
On Mon, Jan 21, 2019 at 1:17 PM Ed Leafe <ed@leafe.com> wrote:
On Jan 21, 2019, at 3:10 AM, Jean-Philippe Evrard <jean-philippe@evrard.me> wrote:
I think it would be great to have a larger community feedback, or at least a API SIG feedback, analysing this pattern.
I would strongly prefer the approach of each service implementing an endpoint to be called by the Keystone when a project is deleted. Relying on a library that would somehow be able to understand all the parts a project touches within a service sounds a lot more error-prone.
Are you referring to the system scope approach detailed on line 38, here [0]? I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]). [0] https://etherpad.openstack.org/p/community-goal-project-deletion [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system...
-- Ed Leafe
On Jan 21, 2019, at 1:55 PM, Lance Bragstad <lbragstad@gmail.com> wrote:
Are you referring to the system scope approach detailed on line 38, here [0]?
Yes.
I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]).
[0] https://etherpad.openstack.org/p/community-goal-project-deletion [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system...
It is more likely that I’m misunderstanding. Reading that etherpad, it appeared that it was indeed the goal to have project deletion in Keystone cascade to all the services, but I guess I missed line 19. So if it isn’t Keystone calling this API on all the services, what would be the appropriate actor? -- Ed Leafe
On Mon, Jan 21, 2019 at 2:18 PM Ed Leafe <ed@leafe.com> wrote:
On Jan 21, 2019, at 1:55 PM, Lance Bragstad <lbragstad@gmail.com> wrote:
Are you referring to the system scope approach detailed on line 38, here
[0]?
Yes.
I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]).
[0] https://etherpad.openstack.org/p/community-goal-project-deletion [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system...
It is more likely that I’m misunderstanding. Reading that etherpad, it appeared that it was indeed the goal to have project deletion in Keystone cascade to all the services, but I guess I missed line 19.
So if it isn’t Keystone calling this API on all the services, what would be the appropriate actor?
The actor could still be something like os-purge or adjutant [0]. Depending on how the implementation shakes out in each service, the implementation in the actor could be an interation of all services calling the same API for each one. I guess the benefit is that the actor doesn't need to manage the deletion order based on the dependencies of the resources (internal or external to a service). Adrian, and others, have given this a bunch more thought than I have. So I'm curious to hear if what I'm saying is in line with how they've envisioned things. I'm recalling most of this from Berlin. [0] https://adjutant.readthedocs.io/en/latest/
-- Ed Leafe
I've expanded on the notes in the etherpad about why Keystone isn't the actor. At the summit we discussed this option, and all the people familiar with Keystone who were in the room (or in some later discussions), agreed that making Keystone the actor is a BAD idea. Keystone does not currently do any orchestration or workflow of this nature, making it do that adds a lot of extra logic which it just shouldn't need. After a project delete it would need to call all the APIs, and then confirm they succeeded, and maybe retry. This would have to be done asynchronously since waiting and confirming the deletion would take longer than a single API call to delete a project in Keystone should take. That kind of logic doesn't fit in Keystone. Not to mention there are issues on how Keystone would know which services support such an API, and where exactly it might be (although catalog + consistent API placement or discovery could solve that). Essentially, going down the route of "make this Keystone's problem" is in my opinion a hard NO, but I'll let the Keystone devs weigh in on that before we make that a very firm hard NO. As for solutions. Ideally we do implement the APIs per service (that's the end goal), but we ALSO make libraries that do deletion of resource using the existing APIs. If the library sees that a service version is one with the purge API it uses it, otherwise it has a fallback for less efficient deletion. This has the major benefit of working for all existing deployments, and ones stuck on older OpenStack versions. This is a universal problem and we need to solve it backwards AND forwards. By doing both (with a first step focus on the libraries) we can actually give projects more time to build the purge API, and maybe have the API portion of the goal extend into another cycle if needed. Essentially, we'd make a purge library that uses the SDK to delete resources. If a service has a purge endpoint, then the library (via the SDK) uses that. The specifics of how the library purges, or if the library will be split into multiple libraries (one top level, and then one per service) is to be decided. A rough look at what a deletion process might looks like: 1. Disable project in Keystone (so no new resources can be created or modified), or clear all role assignments (and api-keys) from project. 2. Purge platform orchestration services (Magnum, Sahara 3. Purge Heat (Heat after Magnum, because magnum and such use Heat, and deleting Heat stacks without deleting the 'resource' which uses that stack can leave a mess) 4. Purge everything left (order to be decided or potentially dynamically chosen). 5. Delete or Disable Keystone project (disable is enough really). The actor is then first a CLI built into the purge library as a OSClient command, then secondly maybe an API or two in Adjutant which will use this library. Or anyone can use the library and make anything they want an actor. Ideally if we can even make the library allow selectively choosing which services to purge (conditional on dependency chain), that could be useful for cases where a user wants to delete everything except maybe what's in Swift or Cinder. This is in many ways a HUGE goal, but one that we really need to accomplish. We've lived with this problem too long and the longer we leave it unsolved, the harder it becomes. On 22/01/19 9:30 AM, Lance Bragstad wrote:
On Mon, Jan 21, 2019 at 2:18 PM Ed Leafe <ed@leafe.com <mailto:ed@leafe.com>> wrote:
On Jan 21, 2019, at 1:55 PM, Lance Bragstad <lbragstad@gmail.com <mailto:lbragstad@gmail.com>> wrote: > > Are you referring to the system scope approach detailed on line 38, here [0]?
Yes.
> I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]). > > [0] https://etherpad.openstack.org/p/community-goal-project-deletion > [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system...
It is more likely that I’m misunderstanding. Reading that etherpad, it appeared that it was indeed the goal to have project deletion in Keystone cascade to all the services, but I guess I missed line 19.
So if it isn’t Keystone calling this API on all the services, what would be the appropriate actor?
The actor could still be something like os-purge or adjutant [0]. Depending on how the implementation shakes out in each service, the implementation in the actor could be an interation of all services calling the same API for each one. I guess the benefit is that the actor doesn't need to manage the deletion order based on the dependencies of the resources (internal or external to a service).
Adrian, and others, have given this a bunch more thought than I have. So I'm curious to hear if what I'm saying is in line with how they've envisioned things. I'm recalling most of this from Berlin.
[0] https://adjutant.readthedocs.io/en/latest/
-- Ed Leafe
Thanks for the thorough feedback Adrian. My opinion is also that Keystone should not be the actor in executing this functionality but somewhere else whether that is Adjutant or any other form (application, library, CLI etc). I would also like to bring up the point about knowing if a project is "dirty" (it has provisioned resources). This is something that I think all business logic would benefit from, we've had issue with knowing when resources should be deleted, our solution is pretty much look at metrics the last X minutes, check if project is disabled and compare to business logic that says it should be deleted. While the above works it kills some of logical points of disabling a project since the only thing that knows if the project should be deleted or is actually disabled is the business logic application that says they clicked the deleted button and not disabled. Most of the functionality you are mentioning is things that the ospurge project has been working to implement and the maintainer even did a full rewrite which improved the dependency arrangement for resource removal. I think the biggest win for this community goal would be the developers of the projects would be available for input regarding the project specific code that does purging. There has been some really nasty bugs in ospurge in the past that if executed with the admin user you would wipe everything and not only that project, which is probably a issue that makes people think twice about using a purging toolkit at all. We should carefully consider what parts of ospurge could be reused, concept, code or anything in between that could help derive what direction we wan't to push this goal. I'm excited :) Best regards Tobias On 01/22/2019 02:18 AM, Adrian Turjak wrote:
I've expanded on the notes in the etherpad about why Keystone isn't the actor.
At the summit we discussed this option, and all the people familiar with Keystone who were in the room (or in some later discussions), agreed that making Keystone the actor is a BAD idea.
Keystone does not currently do any orchestration or workflow of this nature, making it do that adds a lot of extra logic which it just shouldn't need. After a project delete it would need to call all the APIs, and then confirm they succeeded, and maybe retry. This would have to be done asynchronously since waiting and confirming the deletion would take longer than a single API call to delete a project in Keystone should take. That kind of logic doesn't fit in Keystone. Not to mention there are issues on how Keystone would know which services support such an API, and where exactly it might be (although catalog + consistent API placement or discovery could solve that).
Essentially, going down the route of "make this Keystone's problem" is in my opinion a hard NO, but I'll let the Keystone devs weigh in on that before we make that a very firm hard NO.
As for solutions. Ideally we do implement the APIs per service (that's the end goal), but we ALSO make libraries that do deletion of resource using the existing APIs. If the library sees that a service version is one with the purge API it uses it, otherwise it has a fallback for less efficient deletion. This has the major benefit of working for all existing deployments, and ones stuck on older OpenStack versions. This is a universal problem and we need to solve it backwards AND forwards.
By doing both (with a first step focus on the libraries) we can actually give projects more time to build the purge API, and maybe have the API portion of the goal extend into another cycle if needed.
Essentially, we'd make a purge library that uses the SDK to delete resources. If a service has a purge endpoint, then the library (via the SDK) uses that. The specifics of how the library purges, or if the library will be split into multiple libraries (one top level, and then one per service) is to be decided.
A rough look at what a deletion process might looks like: 1. Disable project in Keystone (so no new resources can be created or modified), or clear all role assignments (and api-keys) from project. 2. Purge platform orchestration services (Magnum, Sahara 3. Purge Heat (Heat after Magnum, because magnum and such use Heat, and deleting Heat stacks without deleting the 'resource' which uses that stack can leave a mess) 4. Purge everything left (order to be decided or potentially dynamically chosen). 5. Delete or Disable Keystone project (disable is enough really).
The actor is then first a CLI built into the purge library as a OSClient command, then secondly maybe an API or two in Adjutant which will use this library. Or anyone can use the library and make anything they want an actor.
Ideally if we can even make the library allow selectively choosing which services to purge (conditional on dependency chain), that could be useful for cases where a user wants to delete everything except maybe what's in Swift or Cinder.
This is in many ways a HUGE goal, but one that we really need to accomplish. We've lived with this problem too long and the longer we leave it unsolved, the harder it becomes.
On 22/01/19 9:30 AM, Lance Bragstad wrote:
On Mon, Jan 21, 2019 at 2:18 PM Ed Leafe <ed@leafe.com <mailto:ed@leafe.com>> wrote:
On Jan 21, 2019, at 1:55 PM, Lance Bragstad <lbragstad@gmail.com <mailto:lbragstad@gmail.com>> wrote: > > Are you referring to the system scope approach detailed on line 38, here [0]?
Yes.
> I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]). > > [0] https://etherpad.openstack.org/p/community-goal-project-deletion > [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system...
It is more likely that I’m misunderstanding. Reading that etherpad, it appeared that it was indeed the goal to have project deletion in Keystone cascade to all the services, but I guess I missed line 19.
So if it isn’t Keystone calling this API on all the services, what would be the appropriate actor?
The actor could still be something like os-purge or adjutant [0]. Depending on how the implementation shakes out in each service, the implementation in the actor could be an interation of all services calling the same API for each one. I guess the benefit is that the actor doesn't need to manage the deletion order based on the dependencies of the resources (internal or external to a service).
Adrian, and others, have given this a bunch more thought than I have. So I'm curious to hear if what I'm saying is in line with how they've envisioned things. I'm recalling most of this from Berlin.
[0] https://adjutant.readthedocs.io/en/latest/
-- Ed Leafe
Thanks for the input! I'm willing to bet there are many people excited about this goal, or will be when they realise it exists! The 'dirty' state I think would be solved with a report API in each service (tell me everything a given project has resource wise). Such an API would be useful without needing to query each resource list, and potentially could be an easy thing to implement to help a purge library figure out what to delete. I know right now our method for checking if a project is 'dirty' is part of our quota checking scripts, and it has to query a lot of APIs per service to build an idea of what a project has. As for using existing code, OSPurge could well be a starting point, but the major part of this goal has to be that each OpenStack service (that creates resources owned by a project) takes ownership of their own deletion logic. This is why a top level library for cross project logic, with per service plugin libraries is possibly the best approach. Each library would follow the same template and abstraction layers (as inherited from the top level library), but how each service implements their own deletion is up to them. I would also push for them using the SDK only as their point of interaction with the APIs (lets set some hard requirements and standards!), because that is the python library we should be using going forward. In addition such an approach could mean that anyone can write a plugin for the top level library (e.g. internal company only services) which will automatically get picked up if installed. We would need robust and extensive testing for this, because deletion is critical, and we need it to work, but also not cause damage in ways it shouldn't. And you're right, purge tools purging outside of the scope asked for is a worry. Our own internal logic actually works by having the triggering admin user add itself to the project (and ensure no admin role), then scope a token to just that project, and delete resources form the point of view of a project user. That way it's kind of like a user deleting their own resources, and in truth having a nicer way to even do that (non-admin clearing of project) would be amazing for a lot of people who don't want to close their account or disable their project, but just want to delete stray resources and not get charged. On 23/01/19 4:03 AM, Tobias Urdin wrote:
Thanks for the thorough feedback Adrian.
My opinion is also that Keystone should not be the actor in executing this functionality but somewhere else whether that is Adjutant or any other form (application, library, CLI etc).
I would also like to bring up the point about knowing if a project is "dirty" (it has provisioned resources). This is something that I think all business logic would benefit from, we've had issue with knowing when resources should be deleted, our solution is pretty much look at metrics the last X minutes, check if project is disabled and compare to business logic that says it should be deleted.
While the above works it kills some of logical points of disabling a project since the only thing that knows if the project should be deleted or is actually disabled is the business logic application that says they clicked the deleted button and not disabled.
Most of the functionality you are mentioning is things that the ospurge project has been working to implement and the maintainer even did a full rewrite which improved the dependency arrangement for resource removal.
I think the biggest win for this community goal would be the developers of the projects would be available for input regarding the project specific code that does purging. There has been some really nasty bugs in ospurge in the past that if executed with the admin user you would wipe everything and not only that project, which is probably a issue that makes people think twice about using a purging toolkit at all.
We should carefully consider what parts of ospurge could be reused, concept, code or anything in between that could help derive what direction we wan't to push this goal.
I'm excited :)
Best regards Tobias
Sending out a quick recap. It sounds like we have multiple champions, which is great, in addition to an understanding of how we can implement this. Is it fair to say that we're going to pursue the OSPurge approach* initially and follow up in subsequent releases with more details about service specific (system-scoped) purge APIs? If so, do we think we're ready to propose this and get it into review? * detailed at line 68 here - https://etherpad.openstack.org/p/community-goal-project-deletion On Tue, Jan 22, 2019 at 5:23 PM Adrian Turjak <adriant@catalyst.net.nz> wrote:
Thanks for the input! I'm willing to bet there are many people excited about this goal, or will be when they realise it exists!
The 'dirty' state I think would be solved with a report API in each service (tell me everything a given project has resource wise). Such an API would be useful without needing to query each resource list, and potentially could be an easy thing to implement to help a purge library figure out what to delete. I know right now our method for checking if a project is 'dirty' is part of our quota checking scripts, and it has to query a lot of APIs per service to build an idea of what a project has.
As for using existing code, OSPurge could well be a starting point, but the major part of this goal has to be that each OpenStack service (that creates resources owned by a project) takes ownership of their own deletion logic. This is why a top level library for cross project logic, with per service plugin libraries is possibly the best approach. Each library would follow the same template and abstraction layers (as inherited from the top level library), but how each service implements their own deletion is up to them. I would also push for them using the SDK only as their point of interaction with the APIs (lets set some hard requirements and standards!), because that is the python library we should be using going forward. In addition such an approach could mean that anyone can write a plugin for the top level library (e.g. internal company only services) which will automatically get picked up if installed.
We would need robust and extensive testing for this, because deletion is critical, and we need it to work, but also not cause damage in ways it shouldn't.
And you're right, purge tools purging outside of the scope asked for is a worry. Our own internal logic actually works by having the triggering admin user add itself to the project (and ensure no admin role), then scope a token to just that project, and delete resources form the point of view of a project user. That way it's kind of like a user deleting their own resources, and in truth having a nicer way to even do that (non-admin clearing of project) would be amazing for a lot of people who don't want to close their account or disable their project, but just want to delete stray resources and not get charged.
On 23/01/19 4:03 AM, Tobias Urdin wrote:
Thanks for the thorough feedback Adrian.
My opinion is also that Keystone should not be the actor in executing this functionality but somewhere else whether that is Adjutant or any other form (application, library, CLI etc).
I would also like to bring up the point about knowing if a project is "dirty" (it has provisioned resources). This is something that I think all business logic would benefit from, we've had issue with knowing when resources should be deleted, our solution is pretty much look at metrics the last X minutes, check if project is disabled and compare to business logic that says it should be deleted.
While the above works it kills some of logical points of disabling a project since the only thing that knows if the project should be deleted or is actually disabled is the business logic application that says they clicked the deleted button and not disabled.
Most of the functionality you are mentioning is things that the ospurge project has been working to implement and the maintainer even did a full rewrite which improved the dependency arrangement for resource removal.
I think the biggest win for this community goal would be the developers of the projects would be available for input regarding the project specific code that does purging. There has been some really nasty bugs in ospurge in the past that if executed with the admin user you would wipe everything and not only that project, which is probably a issue that makes people think twice about using a purging toolkit at all.
We should carefully consider what parts of ospurge could be reused, concept, code or anything in between that could help derive what direction we wan't to push this goal.
I'm excited :)
Best regards Tobias
---- On Wed, 23 Jan 2019 08:21:27 +0900 Adrian Turjak <adriant@catalyst.net.nz> wrote ----
Thanks for the input! I'm willing to bet there are many people excited about this goal, or will be when they realise it exists!
The 'dirty' state I think would be solved with a report API in each service (tell me everything a given project has resource wise). Such an API would be useful without needing to query each resource list, and potentially could be an easy thing to implement to help a purge library figure out what to delete. I know right now our method for checking if a project is 'dirty' is part of our quota checking scripts, and it has to query a lot of APIs per service to build an idea of what a project has.
As for using existing code, OSPurge could well be a starting point, but the major part of this goal has to be that each OpenStack service (that creates resources owned by a project) takes ownership of their own deletion logic. This is why a top level library for cross project logic, with per service plugin libraries is possibly the best approach. Each library would follow the same template and abstraction layers (as inherited from the top level library), but how each service implements their own deletion is up to them. I would also push for them using the SDK only as their point of interaction with the APIs (lets set some hard requirements and standards!), because that is the python library we should be using going forward. In addition such an approach could mean that anyone can write a plugin for the top level library (e.g. internal company only services) which will automatically get picked up if installed.
+100 for not making keystone as Actor. Leaving purge responsibility to service side is the best way without any doubt. Instead of accepting Purge APIs from each service, I am thinking we should consider another approach also which can be the plugin-able approach. Ewe can expose the plugin interface from purge library/tool. Each service implements the interface with purge functionality(script or command etc). On discovery of each service's purge plugin, purge library/tool will start the deletion in required order etc. This can give 2 simple benefits 1. No need to detect the service availability before requesting them to purge the resources. I am not sure OSpurge check the availability of services or not. But in plugin approach case, that will not be required. For example, if Congress is not installed in my env then, congress's purge plugin will not be discovered so no need to check Congress service availability. 2. purge all resources interface will not be exposed to anyone except the Purge library/tool. In case of API, we are exposing the interface to user(admin/system scopped etc) which can delete all the resources of that service which is little security issue may be. This can be argued with existing delete API but those are per resource not all. Other side we can say those can be taken care by RBAC but still IMO exposing anything to even permissiable user(especially human) which can destruct the env is not a good idea where only right usage of that interface is something else (Purge library/tool in this case). Plugin-able can also have its cons but Let's first discuss all those possibilities. -gmann
We would need robust and extensive testing for this, because deletion is critical, and we need it to work, but also not cause damage in ways it shouldn't.
And you're right, purge tools purging outside of the scope asked for is a worry. Our own internal logic actually works by having the triggering admin user add itself to the project (and ensure no admin role), then scope a token to just that project, and delete resources form the point of view of a project user. That way it's kind of like a user deleting their own resources, and in truth having a nicer way to even do that (non-admin clearing of project) would be amazing for a lot of people who don't want to close their account or disable their project, but just want to delete stray resources and not get charged.
On 23/01/19 4:03 AM, Tobias Urdin wrote:
Thanks for the thorough feedback Adrian.
My opinion is also that Keystone should not be the actor in executing this functionality but somewhere else whether that is Adjutant or any other form (application, library, CLI etc).
I would also like to bring up the point about knowing if a project is "dirty" (it has provisioned resources). This is something that I think all business logic would benefit from, we've had issue with knowing when resources should be deleted, our solution is pretty much look at metrics the last X minutes, check if project is disabled and compare to business logic that says it should be deleted.
While the above works it kills some of logical points of disabling a project since the only thing that knows if the project should be deleted or is actually disabled is the business logic application that says they clicked the deleted button and not disabled.
Most of the functionality you are mentioning is things that the ospurge project has been working to implement and the maintainer even did a full rewrite which improved the dependency arrangement for resource removal.
I think the biggest win for this community goal would be the developers of the projects would be available for input regarding the project specific code that does purging. There has been some really nasty bugs in ospurge in the past that if executed with the admin user you would wipe everything and not only that project, which is probably a issue that makes people think twice about using a purging toolkit at all.
We should carefully consider what parts of ospurge could be reused, concept, code or anything in between that could help derive what direction we wan't to push this goal.
I'm excited :)
Best regards Tobias
Regarding gmann's #1. Existing OSPurge doesn't have any specific logic per-se but would just return no resources based on the has_service() method which I would assume is checking endpoints. [1] +1 On having a pluggable approach and to Adrian's feedback on having a strict policy on how they should be implemented. Best regards [1] https://review.openstack.org/#/c/600919/1/ospurge/resources/heat.py On 01/30/2019 07:36 AM, Ghanshyam Mann wrote:
---- On Wed, 23 Jan 2019 08:21:27 +0900 Adrian Turjak <adriant@catalyst.net.nz> wrote ----
Thanks for the input! I'm willing to bet there are many people excited about this goal, or will be when they realise it exists!
The 'dirty' state I think would be solved with a report API in each service (tell me everything a given project has resource wise). Such an API would be useful without needing to query each resource list, and potentially could be an easy thing to implement to help a purge library figure out what to delete. I know right now our method for checking if a project is 'dirty' is part of our quota checking scripts, and it has to query a lot of APIs per service to build an idea of what a project has.
As for using existing code, OSPurge could well be a starting point, but the major part of this goal has to be that each OpenStack service (that creates resources owned by a project) takes ownership of their own deletion logic. This is why a top level library for cross project logic, with per service plugin libraries is possibly the best approach. Each library would follow the same template and abstraction layers (as inherited from the top level library), but how each service implements their own deletion is up to them. I would also push for them using the SDK only as their point of interaction with the APIs (lets set some hard requirements and standards!), because that is the python library we should be using going forward. In addition such an approach could mean that anyone can write a plugin for the top level library (e.g. internal company only services) which will automatically get picked up if installed.
+100 for not making keystone as Actor. Leaving purge responsibility to service side is the best way without any doubt.
Instead of accepting Purge APIs from each service, I am thinking we should consider another approach also which can be the plugin-able approach. Ewe can expose the plugin interface from purge library/tool. Each service implements the interface with purge functionality(script or command etc). On discovery of each service's purge plugin, purge library/tool will start the deletion in required order etc.
This can give 2 simple benefits 1. No need to detect the service availability before requesting them to purge the resources. I am not sure OSpurge check the availability of services or not. But in plugin approach case, that will not be required. For example, if Congress is not installed in my env then, congress's purge plugin will not be discovered so no need to check Congress service availability.
2. purge all resources interface will not be exposed to anyone except the Purge library/tool. In case of API, we are exposing the interface to user(admin/system scopped etc) which can delete all the resources of that service which is little security issue may be. This can be argued with existing delete API but those are per resource not all. Other side we can say those can be taken care by RBAC but still IMO exposing anything to even permissiable user(especially human) which can destruct the env is not a good idea where only right usage of that interface is something else (Purge library/tool in this case).
Plugin-able can also have its cons but Let's first discuss all those possibilities.
-gmann
We would need robust and extensive testing for this, because deletion is critical, and we need it to work, but also not cause damage in ways it shouldn't.
And you're right, purge tools purging outside of the scope asked for is a worry. Our own internal logic actually works by having the triggering admin user add itself to the project (and ensure no admin role), then scope a token to just that project, and delete resources form the point of view of a project user. That way it's kind of like a user deleting their own resources, and in truth having a nicer way to even do that (non-admin clearing of project) would be amazing for a lot of people who don't want to close their account or disable their project, but just want to delete stray resources and not get charged.
On 23/01/19 4:03 AM, Tobias Urdin wrote:
Thanks for the thorough feedback Adrian.
My opinion is also that Keystone should not be the actor in executing this functionality but somewhere else whether that is Adjutant or any other form (application, library, CLI etc).
I would also like to bring up the point about knowing if a project is "dirty" (it has provisioned resources). This is something that I think all business logic would benefit from, we've had issue with knowing when resources should be deleted, our solution is pretty much look at metrics the last X minutes, check if project is disabled and compare to business logic that says it should be deleted.
While the above works it kills some of logical points of disabling a project since the only thing that knows if the project should be deleted or is actually disabled is the business logic application that says they clicked the deleted button and not disabled.
Most of the functionality you are mentioning is things that the ospurge project has been working to implement and the maintainer even did a full rewrite which improved the dependency arrangement for resource removal.
I think the biggest win for this community goal would be the developers of the projects would be available for input regarding the project specific code that does purging. There has been some really nasty bugs in ospurge in the past that if executed with the admin user you would wipe everything and not only that project, which is probably a issue that makes people think twice about using a purging toolkit at all.
We should carefully consider what parts of ospurge could be reused, concept, code or anything in between that could help derive what direction we wan't to push this goal.
I'm excited :)
Best regards Tobias
On Wed, 2019-01-30 at 15:28 +0900, Ghanshyam Mann wrote:
Thanks for the input! I'm willing to bet there are many people excited about this goal, or will be when they realise it exists!
The 'dirty' state I think would be solved with a report API in each service (tell me everything a given project has resource wise). Such an API would be useful without needing to query each resource list, and potentially could be an easy thing to implement to help a purge
figure out what to delete. I know right now our method for checking if a project is 'dirty' is part of our quota checking scripts, and it has to query a lot of APIs per service to build an idea of what a project has.
As for using existing code, OSPurge could well be a starting
the major part of this goal has to be that each OpenStack service (that creates resources owned by a project) takes ownership of their own deletion logic. This is why a top level library for cross project logic, with per service plugin libraries is possibly the best approach. Each library would follow the same template and abstraction layers (as inherited from the top level library), but how each service implements their own deletion is up to them. I would also push for them using
---- On Wed, 23 Jan 2019 08:21:27 +0900 Adrian Turjak < adriant@catalyst.net.nz> wrote ---- library point, but the
SDK only as their point of interaction with the APIs (lets set some hard requirements and standards!), because that is the python library we should be using going forward. In addition such an approach could mean that anyone can write a plugin for the top level library (e.g. internal company only services) which will automatically get picked up if installed.
+100 for not making keystone as Actor. Leaving purge responsibility to service side is the best way without any doubt.
Instead of accepting Purge APIs from each service, I am thinking we should consider another approach also which can be the plugin-able approach. Ewe can expose the plugin interface from purge library/tool. Each service implements the interface with purge functionality(script or command etc). On discovery of each service's purge plugin, purge library/tool will start the deletion in required order etc.
This can give 2 simple benefits 1. No need to detect the service availability before requesting them to purge the resources. I am not sure OSpurge check the availability of services or not. But in plugin approach case, that will not be required. For example, if Congress is not installed in my env then, congress's purge plugin will not be discovered so no need to check Congress service availability.
2. purge all resources interface will not be exposed to anyone except the Purge library/tool. In case of API, we are exposing the interface to user(admin/system scopped etc) which can delete all the resources of that service which is little security issue may be. This can be argued with existing delete API but those are per resource not all. Other side we can say those can be taken care by RBAC but still IMO exposing anything to even permissiable user(especially human) which can destruct the env is not a good idea where only right usage of that interface is something else (Purge library/tool in this case).
Plugin-able can also have its cons but Let's first discuss all those possibilities.
-gmann
Wasn't it what was proposed in the etherpad? I am a little confused there.
---- On Tue, 22 Jan 2019 10:14:50 +0900 Adrian Turjak <adriant@catalyst.net.nz> wrote ----
I've expanded on the notes in the etherpad about why Keystone isn't the actor.
At the summit we discussed this option, and all the people familiar with Keystone who were in the room (or in some later discussions), agreed that making Keystone the actor is a BAD idea.
Keystone does not currently do any orchestration or workflow of this nature, making it do that adds a lot of extra logic which it just shouldn't need. After a project delete it would need to call all the APIs, and then confirm they succeeded, and maybe retry. This would have to be done asynchronously since waiting and confirming the deletion would take longer than a single API call to delete a project in Keystone should take. That kind of logic doesn't fit in Keystone. Not to mention there are issues on how Keystone would know which services support such an API, and where exactly it might be (although catalog + consistent API placement or discovery could solve that).
Essentially, going down the route of "make this Keystone's problem" is in my opinion a hard NO, but I'll let the Keystone devs weigh in on that before we make that a very firm hard NO.
As for solutions. Ideally we do implement the APIs per service (that's the end goal), but we ALSO make libraries that do deletion of resource using the existing APIs. If the library sees that a service version is one with the purge API it uses it, otherwise it has a fallback for less efficient deletion. This has the major benefit of working for all existing deployments, and ones stuck on older OpenStack versions. This is a universal problem and we need to solve it backwards AND forwards.
By doing both (with a first step focus on the libraries) we can actually give projects more time to build the purge API, and maybe have the API portion of the goal extend into another cycle if needed.
Essentially, we'd make a purge library that uses the SDK to delete resources. If a service has a purge endpoint, then the library (via the SDK) uses that. The specifics of how the library purges, or if the library will be split into multiple libraries (one top level, and then one per service) is to be decided.
A rough look at what a deletion process might looks like: 1. Disable project in Keystone (so no new resources can be created or modified), or clear all role assignments (and api-keys) from project. 2. Purge platform orchestration services (Magnum, Sahara 3. Purge Heat (Heat after Magnum, because magnum and such use Heat, and deleting Heat stacks without deleting the 'resource' which uses that stack can leave a mess) 4. Purge everything left (order to be decided or potentially dynamically chosen). 5. Delete or Disable Keystone project (disable is enough really).
One important thing we need to discuss is about rollback. If any service or some services not able to delete their resources then, what Purge library should do ? error and rollback? success with non-deleted resources left behind ? error with saying list of non-deleted resources and hold the project deletion till then ? or It can be multiple run deletion but keep the project in disable state until all resources are gone. Because this library is going to provide the functionality of cleanup everything. Half cleaned project deletion can be another issue. IMO project can be in disable state until user able to delete all the resource from the library we provide. -gmann
The actor is then first a CLI built into the purge library as a OSClient command, then secondly maybe an API or two in Adjutant which will use this library. Or anyone can use the library and make anything they want an actor.
Ideally if we can even make the library allow selectively choosing which services to purge (conditional on dependency chain), that could be useful for cases where a user wants to delete everything except maybe what's in Swift or Cinder.
This is in many ways a HUGE goal, but one that we really need to accomplish. We've lived with this problem too long and the longer we leave it unsolved, the harder it becomes.
On 22/01/19 9:30 AM, Lance Bragstad wrote:
On Mon, Jan 21, 2019 at 2:18 PM Ed Leafe <ed@leafe.com> wrote: On Jan 21, 2019, at 1:55 PM, Lance Bragstad <lbragstad@gmail.com> wrote: > > Are you referring to the system scope approach detailed on line 38, here [0]?
Yes.
> I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]). > > [0] https://etherpad.openstack.org/p/community-goal-project-deletion > [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system...
It is more likely that I’m misunderstanding. Reading that etherpad, it appeared that it was indeed the goal to have project deletion in Keystone cascade to all the services, but I guess I missed line 19.
So if it isn’t Keystone calling this API on all the services, what would be the appropriate actor?
The actor could still be something like os-purge or adjutant [0]. Depending on how the implementation shakes out in each service, the implementation in the actor could be an interation of all services calling the same API for each one. I guess the benefit is that the actor doesn't need to manage the deletion order based on the dependencies of the resources (internal or external to a service). Adrian, and others, have given this a bunch more thought than I have. So I'm curious to hear if what I'm saying is in line with how they've envisioned things. I'm recalling most of this from Berlin. [0] https://adjutant.readthedocs.io/en/latest/
-- Ed Leafe
This is an amazing community goal! I think we've all had/are dealing with this pain on a daily basis and there is probably a lot of in-house solution to solving it, or using projects whether open source or not, like ospurge. I don't have super much time to dedicated but for us this is very important so I'd love to get more details on how I could contribute some time into this, not sure I could manage a champion role at this point. Best regards Tobias On 01/11/2019 07:22 AM, Adrian Turjak wrote:
Hello OpenStackers!
As discussed at the Berlin Summit, one of the proposed community goals was project deletion and resource clean-up.
Essentially the problem here is that for almost any company that is running OpenStack we run into the issue of how to delete a project and all the resources associated with that project. What we need is an OpenStack wide solution that every project supports which allows operators of OpenStack to delete everything related to a given project.
Before we can choose this as a goal, we need to define what the actual proposed solution is, and what each service is either implementing or contributing to.
I've started an Etherpad here: https://etherpad.openstack.org/p/community-goal-project-deletion
Please add to it if I've missed anything about the problem description, or to flesh out the proposed solutions, but try to mostly keep any discussion here on the mailing list, so that the Etherpad can hopefully be more of a summary of where the discussions have led.
This is mostly a starting point, and I expect there to be a lot of opinions and probably some push back from doing anything too big. That said, this is a major issue in OpenStack, and something we really do need because OpenStack is too big and too complicated for this not to exist in a smart cross-project manner.
Let's solve this the best we can!
Cheers,
Adrian Turjak
participants (7)
-
Adrian Turjak
-
Ed Leafe
-
Ghanshyam Mann
-
Jean-Philippe Evrard
-
Lance Bragstad
-
Tobias Rydberg
-
Tobias Urdin