[Openstack] suggestion for data backup/recovery in swift

Rostyslav Slipetskyy rslipetskyy at yahoo.com
Tue Jun 7 23:06:16 UTC 2011


> What if a container could be marked as "versioning enabled" 
> and therefore the proxy server would rewrite uploads to 
> distinct versioned objects and translate plain gets to the latest version?
>
> Example:
>
> Upload sample.txt, becomes sample.txt-1306882418.68949
> Upload sample.txt again, becomes sample.txt-1306882422.82929
>
> Ask for sample.txt, get latest sample.txt-1306882422.82929
> Ask for specific sample.txt-1306882418.68949 for older version.
>
> There are some annoyances such as only retaining x old copies, etc. 
> but those could be handled by a smarter proxy, 
> as long as it's okay if they're not /guaranteed/ to be perfect.

The idea to append timestamp (or maybe version number) to filename in order to 
get different storage nodes for different versions of the same file looks 
nice. Actually, file metadata can keep information about the previous 
timestamps/versions of a file. The location of file metadata can be determined 
from the Ring
using "account/container/object" path. The location of actual version of the 
file can be 
determined from the Ring using "account/container/object-version" 
(or "account/container/object-timestamp") path.

The problem with such an approach is that it adds transactional processing -
both metadata has to be updated AND file has to be stored successfully. Deletion
of previous versions of a file can be done in a separate process to lessen 
transaction
length.

Besides, the user-provided name of the file is not necessarily needed to become 
a name of the file in the cluster. It is only needed as an input to the Ring so 
that it 

returns different storage nodes. If the user-provided name becomes a part of the 

physical name it impacts privacy a bit. Imagine an administrator of a storage 
node who amuses 

himself by performing a search using a query "find all jpg 
files where 'naked' is part of the filename" (filename is also stored in 
container database,
but I don't know why is it really needed?)


- Rostik




More information about the Openstack mailing list