[openstack-dev] Announcing Ekko -- Scalable block-based backup for OpenStack

Sam Yaple samuel at yaple.net
Wed Feb 3 15:52:23 UTC 2016

On Wed, Feb 3, 2016 at 3:36 PM, Duncan Thomas <duncan.thomas at gmail.com>

> On 3 February 2016 at 17:27, Sam Yaple <samuel at yaple.net> wrote:
>> And here we get to the meat of the matter. Squashing backups is awful in
>> object storage. It requires you to pull both backups, merge them, then
>> reupload. This also has the downside of casting doubt on a backup since you
>> are now modifying data after it has been backed up (though that doubt is
>> lessened with proper checksuming/hashing which cinder does it looks like).
>> This is the issue Ekko can solve (and has solved over the past 2 years).
>> Ekko can do this "squashing" in a non-traditional way, without ever
>> modifying content or merging anything. With deletions only. This means we
>> do not have to pull two backups, merge, and reupload to delete a backup
>> from the chain.
> I'm sure we've lost most of the audience by this point, but I might as
> well reply here as anywhere else...

That's ok. We are talking and thats important for featuresets that people
don't even know they want!

> In the cinder backup case, since the backup is chunked in object store,
> all that is required is to reference count the chunks that are required for
> the backups you want to keep, get rid of the rest, and re-upload the (very
> small) json mapping file. You can either upload over the old json, or
> create a new one. Either way, the bulk data does not need to be touched.

This is a very similiar method to what Ekko is doing. The json mapping in
Ekko is a manifest file which is a sqlite database. The major difference I
see is Ekko is doing backup trees. If you launch 1000 instances from the
same glance image, you don't need 1000 fulls, you need 1 full and 1000
incrementals. Doing that means you save a ton of space, time, bandwidth,
IO, but it also means n number of backups can reference the same chunk of
data and it makes deletion of that data much harder than you describe in
Cinder. When restoring a backup, you don't _need_ a new full, you need to
start your backups based on the last restore point and the same point about
saving applies. It also means that Ekko can provide "backups can scale with
OpenStack" in that sense. Your backups will only ever be your changed data.

I recognize that isn't probably a huge concern for Cinder, with volumes
typically being just unique data and not duplicate data, but with nova I
would argue _most_ instances in an OpenStack deployment will be based on
the same small subset of images and thats alot of duplicate data to
consider backing up especially at scale.

I will have to understand a bit more about cinder-backup before I approach
that subject with Ekko (which right now is on the newton roadmap). What you
have told me absolutely justifies the cinder-backup name (rather than
cinder-snapshot) so thank you for correcting me on that point!

> --
> --
> Duncan Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160203/cce6ce5c/attachment.html>

More information about the OpenStack-dev mailing list