Thanks for the thorough feedback Adrian.

My opinion is also that Keystone should not be the actor in executing this functionality but somewhere else
whether that is Adjutant or any other form (application, library, CLI etc).

I would also like to bring up the point about knowing if a project is "dirty" (it has provisioned resources).
This is something that I think all business logic would benefit from, we've had issue with knowing when
resources should be deleted, our solution is pretty much look at metrics the last X minutes, check if project
is disabled and compare to business logic that says it should be deleted.

While the above works it kills some of logical points of disabling a project since the only thing that knows if
the project should be deleted or is actually disabled is the business logic application that says they clicked the
deleted button and not disabled.

Most of the functionality you are mentioning is things that the ospurge project has been working to implement and the
maintainer even did a full rewrite which improved the dependency arrangement for resource removal.

I think the biggest win for this community goal would be the developers of the projects would be available for input regarding
the project specific code that does purging. There has been some really nasty bugs in ospurge in the past that if executed with the admin
user you would wipe everything and not only that project, which is probably a issue that makes people think twice about
using a purging toolkit at all.

We should carefully consider what parts of ospurge could be reused, concept, code or anything in between that could help derive
what direction we wan't to push this goal.

I'm excited :)

Best regards
Tobias

On 01/22/2019 02:18 AM, Adrian Turjak wrote:
I've expanded on the notes in the etherpad about why Keystone isn't the actor.

At the summit we discussed this option, and all the people familiar with Keystone who were in the room (or in some later discussions), agreed that making Keystone the actor is a BAD idea.

Keystone does not currently do any orchestration or workflow of this nature, making it do that adds a lot of extra logic which it just shouldn't need. After a project delete it would need to call all the APIs, and then confirm they succeeded, and maybe retry. This would have to be done asynchronously since waiting and confirming the deletion would take longer than a single API call to delete a project in Keystone should take. That kind of logic doesn't fit in Keystone. Not to mention there are issues on how Keystone would know which services support such an API, and where exactly it might be (although catalog + consistent API placement or discovery could solve that).

Essentially, going down the route of "make this Keystone's problem" is in my opinion a hard NO, but I'll let the Keystone devs weigh in on that before we make that a very firm hard NO.

As for solutions. Ideally we do implement the APIs per service (that's the end goal), but we ALSO make libraries that do deletion of resource using the existing APIs. If the library sees that a service version is one with the purge API it uses it, otherwise it has a fallback for less efficient deletion. This has the major benefit of working for all existing deployments, and ones stuck on older OpenStack versions. This is a universal problem and we need to solve it backwards AND forwards.

By doing both (with a first step focus on the libraries) we can actually give projects more time to build the purge API, and maybe have the API portion of the goal extend into another cycle if needed.

Essentially, we'd make a purge library that uses the SDK to delete resources. If a service has a purge endpoint, then the library (via the SDK) uses that. The specifics of how the library purges, or if the library will be split into multiple libraries (one top level, and then one per service) is to be decided.

A rough look at what a deletion process might looks like:
1. Disable project in Keystone (so no new resources can be created or modified), or clear all role assignments (and api-keys) from project.
2. Purge platform orchestration services (Magnum, Sahara
3. Purge Heat (Heat after Magnum, because magnum and such use Heat, and deleting Heat stacks without deleting the 'resource' which uses that stack can leave a mess)
4. Purge everything left (order to be decided or potentially dynamically chosen).
5. Delete or Disable Keystone project (disable is enough really).

The actor is then first a CLI built into the purge library as a OSClient command, then secondly maybe an API or two in Adjutant which will use this library.  Or anyone can use the library and make anything they want an actor.

Ideally if we can even make the library allow selectively choosing which services to purge (conditional on dependency chain), that could be useful for cases where a user wants to delete everything except maybe what's in Swift or Cinder.


This is in many ways a HUGE goal, but one that we really need to accomplish. We've lived with this problem too long and the longer we leave it unsolved, the harder it becomes.


On 22/01/19 9:30 AM, Lance Bragstad wrote:


On Mon, Jan 21, 2019 at 2:18 PM Ed Leafe <ed@leafe.com> wrote:
On Jan 21, 2019, at 1:55 PM, Lance Bragstad <lbragstad@gmail.com> wrote:
>
> Are you referring to the system scope approach detailed on line 38, here [0]?

Yes.

> I might be misunderstanding something, but I didn't think keystone was going to iterate all available services and call clean-up APIs. I think it was just that services would be able to expose an endpoint that cleans up resources without a project scoped token (e.g., it would be system scoped [1]).
>
> [0] https://etherpad.openstack.org/p/community-goal-project-deletion
> [1] https://docs.openstack.org/keystone/latest/admin/tokens-overview.html#system-scoped-tokens

It is more likely that I’m misunderstanding. Reading that etherpad, it appeared that it was indeed the goal to have project deletion in Keystone cascade to all the services, but I guess I missed line 19.

So if it isn’t Keystone calling this API on all the services, what would be the appropriate actor?

The actor could still be something like os-purge or adjutant [0]. Depending on how the implementation shakes out in each service, the implementation in the actor could be an interation of all services calling the same API for each one. I guess the benefit is that the actor doesn't need to manage the deletion order based on the dependencies of the resources (internal or external to a service).

Adrian, and others, have given this a bunch more thought than I have. So I'm curious to hear if what I'm saying is in line with how they've envisioned things. I'm recalling most of this from Berlin.

 


-- Ed Leafe