[openstack-dev] Proposal for Raksha, a Data Protection As a Service project
John Griffith
john.griffith at solidfire.com
Fri Aug 30 04:00:45 UTC 2013
On Thu, Aug 29, 2013 at 6:36 PM, Murali Balcha <Murali.Balcha at triliodata.com
> wrote:
>
> >>> My question is, would it make sense to add to the current mechanisms in
> >>> Nova and Cinder than add the complexity of a new project?
> >
> > I think the answer is yes :)
>
>
> I meant there is a clear need for Raksha project. :)
>
> Thanks,
> Murali Balcha
>
> On Aug 29, 2013, at 7:45 PM, "Murali Balcha" <Murali.Balcha at triliodata.com>
> wrote:
>
> >
> > ________________________________________
> >>> From: Ronen Kat <RONENKAT at il.ibm.com>
> >>> Sen: Thursday, August 29, 2013 2:55 PM
> >>> To: openstack-dev at lists.openstack.org;
> openstack-dev at lists.launchpad.net
> >>> Subject: Re: [openstack-dev] Proposal for Raksha, a Data Protection As
> a Service project
> >
> >>> Hi Murali,
> >
> >>> I think the idea to provide enhanced data protection in OpenStack is a
> >>> great idea, and I have been thinking about backup in OpenStack for a
> while
> >>> now.
> >>> I just not sure a new project is the only way to do.
> >
> >>> (as disclosure, I contributed code to enable IBM TSM as a Cinder backup
> >>> driver)
> >
> > Hi Kat,
> > Consider the following use cases that Raksha will addresses. I will
> discuss from simple to complex use case and then address your specific
> questions with inline comments.
> > 1. VM1 that is created on the local file system with a cinder volume
> attached
> > 2. VM2 that is booted off from a cinder volume and has couple of
> cinder volumes attached
> > 3. VM1 and VM2 all booted from cinder volumes and has couple of
> volumes attached. They also share a private network for internal
> communication.
> > 4.
> > In all these cases Raksha will take a consistent snap of VMs, walk thru
> each VM resources and backup the resources to swift end point.
> > In case 1, that means backup VM image and Cinder volume image to swift
> > In case 2 is an extension of case 1.
> > In case 3, Raksha not only backup VM1 and VM2 and its associated
> resources, it also backup the network configuration
> >
> > Now lets consider the restore case. The restore operation walks thru the
> backup resources and calls into respective openstack services to restore
> those objects. In case1, it first calls Nova API to restore the VM, it
> calls into Cinder to restore the volume and attach the volume to the newly
> restored VM instance. In case of 3, it also calls into Neutron API to
> restore the networking. Hence my argument is that not one OpenStack project
> has a global view of VM and all its resources to implement an effective
> backup and restore services.
> >
> >
> >>> I wonder what is the added-value of a project approach versus
> enhancements
> >>> to the current Nova and Cinder implementations of backup. Let me
> elaborate.
> >
> >>> Nova has a "nova backup" feature that performs a backup of a VM to
> Glance,
> >>> the backup is managed by tenants in the same way that you propose.
> >>> While today it provides only point-in-time full backup, it seems
> reasonable
> >>> that it can be extended support incremental and consistent backup as
> well -
> >>> as the actual work is done either by the Storage or Hypervisor in any
> case.
> >
> > Though Nova has API to upload a snapshot of the VM to glance, it does
> not snapshot any volumes associated with the VM. When a snapshot is
> uploaded to glance, Nova creates an image by collapsing the qemu image with
> delta file and uploads the larger file to glance. If we were to perform
> periodic backups of VMs, this is a very inefficient way to do backup. Also
> having to manage two end points, one for Nova and Cinder is inefficient.
> These are the gaps I called out in Raksha wiki page.
> >
> >
> >>> Cinder has a cinder backup command that performs a volume backup to
> Swift,
> >>> Ceph or TSM. The Ceph implementation also support incremental backup
> (Ceph
> >>> to Ceph).
> >>> I envision that Cinder could be expanded to support incremental backup
> (for
> >>> persistent storage) by adding drivers/plug-ins that will leverage
> >>> incremental backup features of either the storage or Hypervisors.
> >>> Independently, in Havana the ability to do consistent volume snapshots
> was
> >>> added to GlusterFS. I assume that this consistency support could be
> >>> generalized to support other volume drivers, and be utilized as part
> of a
> >>> backup code.
> >
> > I think we are talking specific implementations here. Yes, I am aware of
> Ceph blueprint to support incremental backup, but Cinder backup APIs are
> volume specific. That means if a VM has multiple volumes mapped as in the
> case 2 I discussed, tenant need to call backup api three times. Also if you
> look at the swift layout of the cinder, it is very difficult to tie the
> swift images back to a particular VM. Imagine a tenant were to restore a VM
> and all its resources from a backup copy that was performed a week ago. The
> restore operation is not straight forward.
> > It is my understanding that consistency should be maintained at the VM,
> not at individual volume. It is very difficult to assume how the
> application data inside VM is laid out.
> >
> >>> Looking at the key features in Raksha, it seems that the main features
> >>> (2,3,4,7) could be addressed by improving the current mechanisms in
> Nova
> >>> and Cinder. I didn't included 1 as a feature as it is more a statement
> of
> >>> intent (or goal) than a feature.
> >>> Features 5 (dedup) and 6 (scheduler) are indeed new in your proposal.
> >
> >>> Looking at the source de-duplication feature, and taking Swift as an
> >>> example, it seems reasonable that if Swift will implement
> de-duplication,
> >>> then doing backup to Swift will give us de-duplication for free.
> >>> In fact it would make sense to do the de-duplication at the Swift level
> >>> instead of just the backup layer to gain more duplication
> opportunities.
> >
> > I agree, however Swift is not the only object store that need to support
> dedupe. Ceph is another popular object store too. GlusterFS supports Swift
> end point and there are other commercially available object stores too. So
> you argument becomes very product specific. However source level dedupes
> is different than dedupe at rest. Source level dedupe reduces the backup
> windows and also reduces the amount of data that need to be pumped to
> backup end point like swift.
> >
> >>> Following the above, and assuming it all come true (at times I am
> known to
> >>> be an optimistic), then we are left with backup job scheduling, and I
> >>> wonder if that is enough for a new project.
> >
> > I hope I convinced that Raksha has more to offer than a simple cron job.
> Please take a look at the backup apis, its database schema and the usecases
> it addresses in its wiki page.
> >
> >
> > Bottom line is irrespective how OpenStack is deployed; here is how
> Raksha workflow looks like
> > * Create-backupjob VM1, VM2
> > --> Returns backup job id, id1
> > * Run-backupjob id1
> > --> Returns runid rid1
> > * Run backup job id1
> > --> Returns run id rid2
> >
> > * Restore rid1
> > --> Restores PiT of VM1 and VM2 and its associated volumes
> >
> >
> >>> My question is, would it make sense to add to the current mechanisms in
> >>> Nova and Cinder than add the complexity of a new project?
> >
> > I think the answer is yes :)
> >
> > Regards,
> > Murali Balcha
> >>> __________________________________________
> >>> Ronen I. Kat
> >>> Storage Research
> > IBM Research - Haifa
> > Phone: +972.3.7689493
> > Email: ronenkat at il.ibm.com
> >
> > From: Murali Balcha <Murali.Balcha at triliodata.com>
> > To: "openstack-dev at lists.openstack.org"
> > <openstack-dev at lists.openstack.org>,
> > "openstack at list.openstack.org" <openstack at list.openstack.org
> >,
> > Date: 29/08/2013 01:18 AM
> > Subject: [openstack-dev] Proposal for Raksha, a Data Protection
> As a
> > Service project
> >
> >
> >
> > Hello Stackers,
> > We would like to introduce a new project Raksha, a Data Protection As a
> > Service (DPaaS) for OpenStack Cloud.
> > Raksha’s primary goal is to provide a comprehensive Data Protection for
> > OpenStack by leveraging Nova, Swift, Glance and Cinder. Raksha has
> > following key features:
> > 1. Provide an enterprise grade data protection for OpenStack
> > based clouds
> > 2. Tenant administered backups and restores
> > 3. Application consistent backups
> > 4. Point In Time(PiT) full and incremental backups and
> restores
> > 5. Dedupe at source for efficient backups
> > 6. A job scheduler for periodic backups
> > 7. Noninvasive backup solution that does not require service
> > interruption during backup window
> >
> > You will find the rationale behind the need for Raksha in OpenStack in
> its
> > Wiki. The wiki also has the preliminary design and the API description.
> > Some of the Raksha functionality may overlap with Nova and Cinder
> projects
> > and as a community lets work together to coordinate the features among
> > these projects. We would like to seek out early feedback so we can
> address
> > as many issues as we can in the first code drop. We are hoping to enlist
> > the OpenStack community help in making Raksha a part of OpenStack.
> > Raksha’s project resources:
> > Wiki: https://wiki.openstack.org/wiki/Raksha
> > Launchpad: https://launchpad.net/raksha
> > Github: https://github.com/DPaaS-Raksha/Raksha (We will upload a
> prototype
> > code in few days)
> > If you want to talk to us, send an email to
> > openstack-dev at lists.launchpad.net with "[raksha]" in the subject or use
> > #openstack-raksha irc channel.
> >
> > Best Regards,
> > Murali Balcha_______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
Hi Murali,
This sounds pretty neat, but in my opinion it seems that we have most of
the items in your list covered with the Cinder backup service. As far as
backing up instances, I'm personally not sure about backing up ephemeral
objects? We already have the ability to create and upload an image which
is "kind of a backup". Also if you want a persistent instance wouldn't it
be better to have it reside on persistent storage and back that up?
Anyway, my personal thought is that it might be more efficient to see how
things develop with the backup service in Cinder.
As far as the deduplication idea, I really think that's much better done on
the target rather than trying to process it on the source. processing this
in the backup service is pretty expensive and there are a lot of trouble
spots, not the least of which is a pretty big hit on performance.
Also as was pointed out, there are quite a few efficiencies and
optimizations that can be realized by leaving the work closer to the
backend storage itself. There are a number of cases already pointed out
where there are some good optimizations, in addition there are also a
number of back-ends in Cinder already that have plans for further
enhancements/optimizations as well.
Anyway, that's just my opinion. I'd be really interested in talking with
you more (maybe at the summit) regarding some of the work you're doing and
some of the ideas that you have. It would be interesting to see what we
could do to improve the backup service already in Cinder.
Thanks,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130829/03417039/attachment.html>
More information about the OpenStack-dev
mailing list