[openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

Tim Bell Tim.Bell at cern.ch
Fri Mar 14 20:54:32 UTC 2014


I think we need to split the scenarios and focus on the end user experience with the cloud 

.... a few come to my mind from the CERN experience (but this may not be all):

1. Accidental deletion of an object (including meta data)
2. Multi-level consistency (such as between Cell API and child instances)
3. Auditing

CERN has the scenario 1 at a reasonable frequency. Ultimately, it is due to error by
--
A - the openstack administrators themselves
B - the delegated project administrators
C - users with a non-optimised scope for administrative action
D - users who make mistakes

It seems that we should handle these as different cases

3 - make sure there is a log entry (ideally off the box) for all operations
2 - up to the component implementers but with the aim to expire deleted entries as soon as reasonable consistency is achieved
1[A-D] - how can we recover from operator/project admin/user error ?

I understand that there are differing perspectives from cloud to server consolidation but my cloud users expect that if they create a one-off virtual desktop running Windows for software testing and install a set of software, I don't tell them it was accidentally deleted due to operator error (1A or 1B), you need to re-create it.

Tim

> -----Original Message-----
> From: Jay Pipes [mailto:jaypipes at gmail.com]
> Sent: 14 March 2014 16:55
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)
> 
> On Fri, 2014-03-14 at 08:37 +0100, Radomir Dopieralski wrote:
> > Hello,
> >
> > I also think that this thread is going in the wrong direction, but I
> > don't think the direction Boris wants is the correct one either.
> > Frankly I'm a little surprised that nobody mentioned another advantage
> > that soft delete gives us, the one that I think it was actually used for originally.
> >
> > You see, soft delete is an optimization. It's there to make the system
> > work faster as a whole, have less code and be simpler to maintain and debug.
> >
> > How does it do it, when, as clearly shown in the first post in this
> > thread, it makes the queries slower, requires additional indices in
> > the database and more logic in the queries?
> 
> I feel it isn't an optimization if:
> 
> * It slows down the code base
> * Makes the code harder to read and understand
> * Deliberately obscures the actions of removing and restoring resources
> * Encourages the idea that everything in the system is "undoable", like the cloud is a Word doc.
> 
> >  The answer is, by doing more
> > with those queries, by making you write less code, execute fewer
> > queries to the databases and avoid duplicating the same data in multiple places.
> 
> Fewer queries does not aklways make faster code, nor does it lead to inherently race-free code.
> 
> > OpenStack is a big, distributed system of multiple databases that
> > sometimes rely on each other and cross-reference their records. It's
> > not uncommon to have some long-running operation started, that uses
> > some data, and then, in the middle of its execution, have that data deleted.
> > With soft delete, that's not a problem -- the operation can continue
> > safely and proceed as scheduled, with the data it was started with in
> > the first place -- it still has access to the deleted records as if
> > nothing happened.
> 
> I believe a better solution would be to use Boris' solution and implement safeguards around the delete operation. For instance, not
> being able to delete an instance that has tasks still running against it. Either that, or implement true task abortion logic that can
> notify distributed components about the need to stop a running task because either the user wants to delete a resource or simply
> cancel the operation they began.
> 
> >  You simply won't be able to schedule another operation like that with
> > the same data, because it has been soft-deleted and won't pass the
> > validation at the beginning (or even won't appear in the UI or CLI).
> > This solves a lot of race conditions, error handling, additional
> > checks to make sure the record still exists, etc.
> 
> Sorry, I disagree here. Components that rely on the soft-delete behavior to get the resource data from the database should instead
> respond to a NotFound that gets raised by aborting their running task.
> 
> > Without soft delete, you need to write custom code every time to
> > handle the case of a record being deleted mid-operation, including all
> > the possible combinations of which record and when.
> 
> Not custom code. Explicit code paths for explicit actions.
> 
> >  Or you need to copy all
> > the relevant data in advance over to whatever is executing that
> > operation.
> 
> This is already happening.
> 
> > This cannot be abstracted away entirely (although tools like TaskFlow
> > help), as this is specific to the case you are handling. And it's not
> > easy to find all the places where you can have a race condition like
> > that -- especially when you are modifying existing code that has been
> > relying on soft delete before. You can have bugs undetected for years,
> > that only appear in production, on very large deployments, and are
> > impossible to reproduce reliably.
> >
> > There are more similar cases like that, including cascading deletes
> > and more advanced stuff, but I think this single case already shows
> > that the advantages of soft delete out-weight its disadvantages.
> 
> I respectfully disagree :) I think the benefits of explicit code paths and increased performance of the database outweigh the costs of
> changing existing code.
> 
> Best,
> -jay
> 
> > On 13/03/14 19:52, Boris Pavlovic wrote:
> > > Hi all,
> > >
> > >
> > > I would like to fix direction of this thread. Cause it is going in
> > > wrong direction.
> > >
> > > To assume:
> > > 1) Yes restoring already deleted recourses could be useful.
> > > 2) Current approach with soft deletion is broken by design and we
> > > should get rid of them.
> > >
> > > More about why I think that it is broken:
> > > 1) When you are restoring some resource you should restore N records
> > > from N tables (e.g. VM)
> > > 2) Restoring sometimes means not only restoring DB records.
> > > 3) Not all resources should be restorable (e.g. why I need to
> > > restore fixed_ip? or key-pairs?)
> > >
> > >
> > > So what we should think about is:
> > > 1) How to implement restoring functionally in common way (e.g.
> > > framework that will be in oslo)
> > > 2) Split of work of getting rid of soft deletion in steps (that I
> > > already mention):
> > > a) remove soft deletion from places where we are not using it
> > > b) replace internal code where we are using soft deletion to that
> > > framework
> > > c) replace API stuff using ceilometer (for logs) or this framework
> > > (for restorable stuff)
> > >
> > >
> > > To put in a nutshell: Restoring Delete resources / Delayed Deletion
> > > != Soft deletion.
> > >
> > >
> > > Best regards,
> > > Boris Pavlovic
> > >
> > >
> > >
> > > On Thu, Mar 13, 2014 at 9:21 PM, Mike Wilson <geekinutah at gmail.com
> > > <mailto:geekinutah at gmail.com>> wrote:
> > >
> > >     For some guests we use the LVM imagebackend and there are times when
> > >     the guest is deleted on accident. Humans, being what they are, don't
> > >     back up their files and don't take care of important data, so it is
> > >     not uncommon to use lvrestore and "undelete" an instance so that
> > >     people can get their data. Of course, this is not always possible if
> > >     the data has been subsequently overwritten. But it is common enough
> > >     that I imagine most of our operators are familiar with how to do it.
> > >     So I guess my saying that we do it on a regular basis is not quite
> > >     accurate. Probably would be better to say that it is not uncommon to
> > >     do this, but definitely not a daily task or something of that ilk.
> > >
> > >     I have personally "undeleted" an instance a few times after
> > >     accidental deletion also. I can't remember the specifics, but I do
> > >     remember doing it :-).
> > >
> > >     -Mike
> > >
> > >
> > >     On Tue, Mar 11, 2014 at 12:46 PM, Johannes Erdfelt
> > >     <johannes at erdfelt.com <mailto:johannes at erdfelt.com>> wrote:
> > >
> > >         On Tue, Mar 11, 2014, Mike Wilson <geekinutah at gmail.com
> > >         <mailto:geekinutah at gmail.com>> wrote:
> > >         > Undeleting things is an important use case in my opinion. We
> > >         do this in our
> > >         > environment on a regular basis. In that light I'm not sure
> > >         that it would be
> > >         > appropriate just to log the deletion and git rid of the row. I
> > >         would like
> > >         > to see it go to an archival table where it is easily restored.
> > >
> > >         I'm curious, what are you undeleting and why?
> > >
> > >         JE
> > >
> > >
> > >         _______________________________________________
> > >         OpenStack-dev mailing list
> > >         OpenStack-dev at lists.openstack.org
> > >         <mailto:OpenStack-dev at lists.openstack.org>
> > >
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
> > >
> > >     _______________________________________________
> > >     OpenStack-dev mailing list
> > >     OpenStack-dev at lists.openstack.org
> > >     <mailto:OpenStack-dev at lists.openstack.org>
> > >
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > OpenStack-dev mailing list
> > > OpenStack-dev at lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list