[openstack-dev] [magnetodb] Backup procedure for Cassandra backend

Denis Makogon dmakogon at mirantis.com
Fri Aug 29 14:33:59 UTC 2014


On Fri, Aug 29, 2014 at 4:29 PM, Dmitriy Ukhlov <dukhlov at mirantis.com>
wrote:

> Hello Denis,
> Thank you for very useful knowledge sharing.
>
> But I have one more question. As far as I understood if we have
> replication factor 3 it means that our backup may contain three copies of
> the same data. Also it may contain some not compacted sstables set. Do we
> have any ability to compact collected backup data before moving it to
> backup storage?
>

Thanks for fast response, Dmitriy.

With replication factor 3 - yes, this looks like a feature that allows to
backup only one node instead of 3 of them. In other cases, we would need to
iterate over each node, as you know.
Correct, it is possible to have not compacted SSTables. To accomplish
compaction we might need to use compaction mechanism provided by the
nodetool, see
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCompact.html,
we just need take into account that it's possible that sstable was already
compacted and force compaction wouldn't give valuable benefits.


Best regards,
Denis Makogon


>
> On Fri, Aug 29, 2014 at 2:01 PM, Denis Makogon <dmakogon at mirantis.com>
> wrote:
>
>> Hello, stackers. I'd like to start thread related to backuping procedure
>> for MagnetoDB, to be precise, for Cassandra backend.
>>
>> In order to accomplish backuping procedure for Cassandra we need to
>> understand how does backuping work.
>>
>> To perform backuping:
>>
>>    1.
>>
>>    We need to SSH into each node
>>    2.
>>
>>    Call ‘nodetool snapshot’ with appropriate parameters
>>    3.
>>
>>    Collect backup.
>>    4.
>>
>>    Send backup to remote storage.
>>    5.
>>
>>    Remove initial snapshot
>>
>>
>>  Lets take a look how does ‘nodetool snapshot’ works. Cassandra backs up
>> data by taking a snapshot of all on-disk data files (SSTable files) stored
>> in the data directory. Each time an SSTable gets flushed and snapshotted it
>> becomes a hard link against initial SSTable pinned to specific timestamp.
>>
>> Snapshots are taken per keyspace or per-CF and while the system is
>> online. However, nodes must be taken offline in order to restore a snapshot.
>>
>> Using a parallel ssh tool (such as pssh), you can flush and then snapshot
>> an entire cluster. This provides an eventually consistent backup.
>> Although no one node is guaranteed to be consistent with its replica nodes
>> at the time a snapshot is taken, a restored snapshot can resume consistency
>> using Cassandra's built-in consistency mechanisms.
>>
>> After a system-wide snapshot has been taken, you can enable incremental
>> backups on each node (disabled by default) to backup data that has changed
>> since the last snapshot was taken. Each time an SSTable is flushed, a hard
>> link is copied into a /backups subdirectory of the data directory.
>>
>> Now lets see how can we deal with snapshot once its taken. Below you can
>> see a list of command that needs to be executed to prepare a snapshot:
>>
>>     Flushing SSTables for consistency
>>
>>     'nodetool flush'
>>
>>     Creating snapshots (for example of all keyspaces)
>>
>>     "nodetool snapshot -t %(backup_name)s 1>/dev/null",
>>
>> where
>>
>>    -
>>
>>    backup_name - is a name of snapshot
>>
>>
>> Once it’s done we would need to collect all hard links into a common
>> directory (with keeping initial file hierarchy):
>>
>> sudo tar cpzfP /tmp/all_ks.tar.gz\
>>
>> $(sudo find %(datadir)s -type d -name %(backup_name)s)"
>>
>> where
>>
>>    -
>>
>>    backup_name - is a name of snapshot,
>>    -
>>
>>    datadir - storage location (/var/lib/cassandra/data, by the default)
>>
>>
>>  Note that this operation can be extended:
>>
>>    -
>>
>>    if cassandra was launched with more than one data directory (see
>>    cassandra.yaml
>>    <http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html>
>>    )
>>    -
>>
>>    if we want to backup only:
>>    -
>>
>>       certain keyspaces at the same time
>>       -
>>
>>       one keyspace
>>       -
>>
>>       a list of CF’s for given keyspace
>>
>>
>> Useful links
>>
>>
>> http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html
>>
>> Best regards,
>> Denis Makogon
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Best regards,
> Dmitriy Ukhlov
> Mirantis Inc.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140829/77194a54/attachment.html>


More information about the OpenStack-dev mailing list