[openstack-dev] [TROVE] Point in time recovery design

Denis Makogon dmakogon at mirantis.com
Wed Mar 5 11:11:37 UTC 2014


Let me elaborate a bit.

Point in time recovery means that:

I, as the database server administrator, want to restore data from the
backup (that has "created" time equals to X) to already running server, i don't
want to spin-up new server, i want this one.

Point-in-time is nothing else than point in time when backup was created.

Trove is only able to recreate instance from new backup. I suggested
feature that allows to restore/recover data at the server that already
running.

The goal to provide a developer with an API call for /recover the already
running instance from backup that {"created": "2014-03-04 00:00:00.00Z"}.
It clearly written in the initial mail.

The first iteration will cover point-in-time recovery from full backup.

Simple use case, my quota for instances equals 1 (allowed only one instance
to be provisioned). Somehow, after long period of time my database got
corrupted. So, i need to recover it. Hopefully i've got the backup that was
COMPLETED not so long ago. So, i want to recover data from it.
Unfortunately, Trove doesn't able to do that, because there's no such
feature. If Trove would able to do that, users would be able to recover
their own servers without provisioning emergency ones, from scratch, that
restored from given backup.

Best regards,
Denis Makogon
dmakogon at mirantis.com
www.mirantis.com

On Wed, Mar 5, 2014 at 3:22 AM, Doug Shelley <doug at tesora.com> wrote:

>  Kevin - I agree with your "What I'm worried about" comment. I think we
> need to move further down the path with the base backup/restore and the
> replication/clustering features across more datastores before pushing mysql
> too far ahead. Given the growth of the team over the last 6 months or so I
> definitely agree that more bench depth on datastores would be of benefit to
> the project and community.
>
>
>
> Doug
>
>
>
>
>
> *From:* Kevin Conway [mailto:kevinjacobconway at gmail.com]
> *Sent:* March-04-14 1:20 PM
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [TROVE] Point in time recovery design
>
>
>
> Have we discussed what level of granularity will be supported by point in
> time recovery?
>
>
>
> For example, there is already an initial implementation of incremental
> backups/restores for the MySQL/InnoDB family of data stores. Is that not
> already a form of point in time recover?
>
>
>
> Is the goal to provide a developer with an API call for /recover {"at":
> "2014-03-04 00:00:00.00Z"}? Supporting a true point in time recovery like
> that is going to be very difficult. In a former life I worked with a group
> of DBAs that managed a large Oracle deployment and dealt with point in time
> recovery quite a bit. In fact, the ability to recover to a particular
> transaction ID from the bin logs rather than just a timestamp was critical
> to support any kind of fault tolerance with our replication setup (just a
> heads up that these things are likely going to be problems when we combine
> backup/restore with clustering/replication).
>
>
>
> Incremental backups and point in time recovery are great features that I
> would love to see a part of Trove. What I'm worried about is that we're
> pushing forward on these features without enough datastore experts weighing
> in on what realistic expectations are for these features across data stores
> and how these features might impact replication/clustering. Don't get me
> wrong, Xtrabackup is a neat tool, but I don't want to see us build an
> entire API spec around it.
>
>
>
> As it is, Trove only supports backups for MySQL. We don't even support
> incremental backups for MySQL; only the subset of MySQL stores running
> InnoDB. I'd like to see more backup operations implemented by the
> requested/merged data stores before we start talking about point in time
> recovery.
>
>
>
> *From: *Daniel Morris <daniel.morris at RACKSPACE.COM>
> *Reply-To: *"OpenStack Development Mailing List (not for usage
> questions)" <openstack-dev at lists.openstack.org>
> *Date: *Tuesday, March 4, 2014 10:36 AM
> *To: *"OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev at lists.openstack.org>
> *Subject: *Re: [openstack-dev] [TROVE] Point in time recovery design
>
>
>
> Nice write-up Denis.  Generally, I think this should merge with the work
> going on for scheduled tasks and scheduled backups.
>
>
>
> https://wiki.openstack.org/wiki/Trove/scheduled-tasks
>
>
>
> Point in time recovery was one of the original goals of the scheduled /
> automated backup work, but had not been fully worked out.  Currently this
> work is sitting idle - https://review.openstack.org/#/c/73702/
>
>
>
> I believe that at the time this was originally discussed, the idea was
> that this would be handled on a new instance creation (not an active
> instance), and would be accomplished via a new instance creation as follows:
>
>
>
> POST /instances
>
>
>
> {
>
>          "instance": {
>
>          "flavorRef": "https://service//v1.0/1234/flavors/1<https://service/v1.0/1234/flavors/1>
> ",
>
>          "name": "myinstance",
>
>          "volume": {
>
>              "size": 2
>
>          }
>
>          "restorePoint": {
>
>              "point_in_time" : "2012-03-28T22:00Z",
>
>              "instanceRef": "
> https://service/v1.0/1234/instances/2450c73f-7805-4afe-a42c-4094ab42666b"
>
>          }
>
>     }
>
> }
>
>
>
> Regardless of the API design (up for debate), we need this capability
> integrated and just need to work out the best way to do it.
>
>
>
> Daniel
>
>
>
> *From: *Denis Makogon <dmakogon at mirantis.com>
> *Reply-To: *"OpenStack Development Mailing List (not for usage
> questions)" <openstack-dev at lists.openstack.org>
> *Date: *Tuesday, March 4, 2014 5:15 AM
> *To: *OpenStack Development Mailing List <
> openstack-dev at lists.openstack.org>
> *Subject: *[openstack-dev] [TROVE] Point in time recovery design
>
>
>
>     Trove. Point-in-Time recovery.
>
>
>
>
>  1.    Introduction<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.tg53tr6tfa3>
> .
>
> 2.    What is a point in time recovery?<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.rnvnnwld05c2>
>
> 3.    What does it take to do a point in time recovery?<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.cp013g5it8qq>
>
> 4.    What to consider once you know your database is corrupted?<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.oe6kdjz502c7>
>
> 5.    Trove and Point-in-time recovery.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.1m5g1t97bfdx>
>
> 6.    Trove core ReST API and Point-in-Time Recovery/Restore flow.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.qjqnx4eo6du8>
>
> 1.    ReST routes.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.yedhftl8z7td>
>
> 2.    Request body.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.7grnwb9z2u3g>
>
> 3.    Response object.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.3n48u9fe0w43>
>
> 7.    Trove taskmanager RPC API and Point-in-Time Recovery/Restore flow.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.1pr74mrhlana>
>
> 1.    RPC message.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.hk9iralbbn60>
>
> 2.    RPC message type.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.kgt3wgj6q4j7>
>
> 8.    Trove guestagent RPC API and Point-in-Time Recovery/Restore flow.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.gme8lf1lvok2>
>
> 1.    RPC message.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.wxienab8nvi7>
>
> 2.    RPC message type.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.stpp0ym7nd04>
>
> 3.    Method implementation.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.y4xmxn4pju8f>
>
> 9.    Proposed implementation for Trove and for Python-troveclient.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.5bpxcfj9gujs>
>
> 10.  Useful links.<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.p0h05fqgo9oj>
>  I<https://docs.google.com/a/mirantis.com/document/d/12qHHYCQ3BTOKCEcbfp-75NPJc15xPD01WEQe9OmyOxc/edit#heading=h.p0h05fqgo9oj>
> ntroduction
>
>
>
> Every once in a while, an event might happen that corrupts a database. We
> have all made a stupid mistake at least once that had trashed a database.
> When this happens what do you do? If you do not have a database backup,
> then you had better own up to the problem you caused and tell your boss
> that you screwed up. If you do have at least a complete database backup
> then you most likely will be able to recover the corrupted database, up to
> the point that you corrupted the data. This article will discuss how to use
> a point in time restore to recover your databases.
>
> If you google "Point in time recovery" you also could find "Point in time
> restore". So, let decide how to call it. Historically, database has a
> feature called *Point in time recovery.*
>
>
>  What is a point-in-time recovery?
>
> So what is a point in time recovery? A point in time recovery is restoring
> a database to a specified date and time. When you have completed a point in
> time recovery, your database will be in the state it was at the specific
> date and time you identified when restoring your database. A point in time
> recovery is a method to recover your database to any point in time since
> the last database backup.
>
>
>  What does it take to do a point-in-time recovery?
>
> In order to perform a point in time recovery you will need to have an
> entire series of backups (complete, differential, and transaction log
> backups) up to and/or beyond the point in time in which you want to
> recover. If you are missing any backups, or have truncated the transaction
> log without first performing a transaction log backup, then you will not be
> able to perform a point in time recovery. At a minimum, you will need a
> complete backup and all the transaction log backups taken following the
> complete backup. Optionally if you are taking differential backups, then
> you will need the complete backup, the last differential backup prior to
> the corruption, then all the transaction log backups taken following the
> differential backup.
>
>
>  Trove and Point-in-time recovery
>
>
>
> OpenStack DBaaS Trove is able to perform instance restoration (whole new
> instance, from scratch) from previously stored backup in remote storage
> (OpenStack Swift, Amazon AWS S3, etc). From administration/regular user
> perspective Trove should be able to perform point in time recovery.
> Basically it's almost the same as restoring new instance, but the
> difference between restore (in terms of Trove) and recovery is huge.
>
> *Restore* gives an ability to spin-up *new *instance from backup (as
> mentioned earlier), but the *Recovery *gives an ability to restore
> already running instance from backup. For the beginning Trove would be able
> to recover/restore running instance from full backup.
>  Trove core ReST API and Point-in-Time Recovery/Restore flow
>
>
>  *ReST routes*
>
>
>
> HTTP method
>
> Routes
>
>
>
> *POST*
>
> {*tenant_id*}/instances/{*instance_id*}/recover
>
>
>
> or
>
>
>
> {*tenant_id*}/instances/{*instance_id*}/restore
>    *Request body*
>
>
>
> *"recovery": {*
>
> *    "instance": UUID,*
>
> *    "backup": UUID,*
>
> *}*
>  *Response object*
>
>
>
> *"recovery": {*
>
> *    "id": "instance_id",*
>
> *    "name": "instance_name",*
>
> *    "status": "BUILDING",*
>
> *    "datastore": "mysql",*
>
> *    "recovered_from_backup": "backup_id",*
>
> *    "point_in_time": "2011-01-22T13:25:27-06:00",*
>
> *}*
>
>
>  Trove taskmanager RPC API and Point-in-Time Recovery/Restore flow
>
>
>  *RPC message*
>
> RPC method
>
> Method parameters
>
> *do_instance_recovery*
>
> *instance_id*
>
> *backup_id*
>    *RPC message type*
>
> *    CAST with poll until instance reach ACTIVE status.*
>  Trove guestagent RPC API and Point-in-Time Recovery/Restore flow
>
>
>  *RPC message*
>
> RPC method
>
> Method parameters
>
>
>
> *do_recovery*
>
> *       backup_info: {*
>
> *                     'id': backup_id,*
>
> *                     'location': location,*
>
> *                     'type': backup_type,*
>
> *                    'checksum': checksum,*
>
> *       }*
>    *RPC message type*
>
> *    CAST*
>
>
>  *Method implementation*
>
>
>
> Re-used restore functionality (restore from full backup).
>
>
>  Proposed implementation for Trove and for Python-troveclient
>
>
>
> 1.    Trove: [1]
>
> 2.    Python-troveclient: [2]
>  Useful links      [1] https://review.openstack.org/#/c/77222/
>
>      [2] https://review.openstack.org/#/c/77223/
>
>  Best Regards,
>
> Denis Makogon
>
> dmakogon at mirantis.com
>
> www.mirantis.com
>
>
>
>  _______________________________________________ OpenStack-dev mailing
> list OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140305/6e0cd587/attachment.html>


More information about the OpenStack-dev mailing list