Hi all,

after having some thoughts, I came to another solution, that I think is the most appropriate here, kind of a variation of option 1:

4. Castellan should cleanup intermediate resources before returning secret ID(s) to the caller

As I see it now, the root of the problem is in castellan's BarbicanKeyManager and the way it hides implementation details from the user.
Since it returns only IDs of created secrets to the user, the api caller has no notion that something else has to be deleted once it is time for this.
Since Barbican API is perfectly capable to delete orders and containers without deleting the secrets they reference, this is what castellan should do just before it returns IDs of generated secrets to the API caller.
The only small trouble is that with default 'legacy' API policies in Barbican, not everybody who can create orders can delete them.. but this can be accounted for with try..except.

Please review the patch in this regard https://review.opendev.org/c/openstack/castellan/+/877423

Best regards,

On Mon, Mar 6, 2023 at 7:32 PM Pavlo Shchelokovskyy <pshchelokovskyy@mirantis.com> wrote:
Hi all,

we are observing the following behavior in Barbican:
- OpenStack environment is using both encrypted Cinder volumes and encrypted local storage (lvm) for Nova instances
- over the time, the secrets and orders tables are growing
- many soft-deleted entries in secrets DB can not be purged by the db cleanup script

As I understand what is happening - both Cinder and Nova create secrets in Barbican on behalf of the user when creating an encrypted volume or booting an instance with encrypted local storage. They both do it via castellan library, that under the hood creates orders in Barbican, waits for them to become active and returns to the caller only the ID of the generated secret. When time comes to delete the thing (volume or instance) Cinder/Nova again use castellan, but only delete the secret, not the order (they are not aware that there was any 'order' created anyway). As a result, the orders are left in place, and DB cleanup procedure does not delete soft-deleted secrets when there's an ACTIVE order referencing such secret.

This is troublesomes on many levels - users who use Cinder or Nova may not even be aware that they are creating something in Barbican. Orders accumulating like that may eventually result in cryptic errors when e.g. when you run out of quota for orders. And what's more, default Barbican policies do allow 'normal' (creator) users to create an order, but not delete it (only project admin can do it), so even if the users are aware of Barbican involvement, they can not delete those orders manually anyway. Plus there's no good way in API to determine outright which orders are referencing deleted secrets.

I see several ways of dealing with that and would like to ask for your opinion on what would be the best one:
1. Amend Barbican API to allow filtering orders by the secrets, when castellan deletes a secret - search for corresponding order and delete it as well, change default policy to actually allow order deletion by the same users who can create them.
2. Cascade-delete orders when deleting secrets - this is easy but probably violates that very policy that disallowed normal users to delete orders.
3. improve the database cleanup so it first marks any order that references a deleted secret also as deleted, so later when time comes both could be purged (or something like that). This also has a similar downside to the previous option by not being explicit enough.

I've filed a bug for that https://storyboard.openstack.org/#!/story/2010625 and proposed a patch for option 2 (cascade delete), but would like to ask what would you see as the most appropriate way  or may be there's something else that I've missed.

Btw, the problem is probably even more pronounced with keypairs - when castellan is used to create those, under the hood both order and container are created besides the actual secrets, and again only the secret ids are returned to the caller. When time comes to delete things, the caller only knows about secret IDs, and can only delete them, leaving both container and order behind.
Luckily, I did not find any place across OpenStack that actually creates keypairs using castellan... but the problem is definitely there.

Best regards,
--
Dr. Pavlo Shchelokovskyy
Principal Software Engineer
Mirantis Inc


--
Dr. Pavlo Shchelokovskyy
Principal Software Engineer
Mirantis Inc