<div dir="ltr"><div><div>Hello Denis,<br></div>Thank you for very useful knowledge sharing.<br><br></div>But I have one more question. As far as I understood if we have replication factor 3 it means that our backup may contain three copies of the same data. Also it may contain some not compacted sstables set. Do we have any ability to compact collected backup data before moving it to backup storage?<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Aug 29, 2014 at 2:01 PM, Denis Makogon <span dir="ltr"><<a href="mailto:dmakogon@mirantis.com" target="_blank">dmakogon@mirantis.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span><p style="margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><font color="#000000" face="Times New Roman"><span style="font-size:19px;line-height:21.850000381469727px;white-space:pre-wrap">Hello, stackers. I'd like to start thread related to backuping procedure for MagnetoDB, to be precise, for Cassandra backend.</span></font></p>

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">In order to accomplish backuping procedure for Cassandra we need to understand how does backuping work.</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">To perform backuping:</span></p>

<ol style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:decimal;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">We need to SSH into each node</span></p></li><li dir="ltr" style="list-style-type:decimal;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent">

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Call ‘nodetool snapshot’ with appropriate parameters</span></p>

</li><li dir="ltr" style="list-style-type:decimal;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Collect backup.</span></p></li><li dir="ltr" style="list-style-type:decimal;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent">

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Send backup to remote storage.</span></p></li>

<li dir="ltr" style="list-style-type:decimal;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Remove initial snapshot</span></p></li></ol><br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap">       </span></span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Lets take a look how does ‘</span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">nodetool snapshot</span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">’ works. Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory. Each time an SSTable gets flushed and snapshotted it becomes a hard link against initial SSTable pinned to specific timestamp.</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Snapshots are taken per keyspace or per-CF and while the system is online. However, nodes must be taken offline in order to restore a snapshot.</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Using a parallel ssh tool (such as pssh), you can flush and then snapshot an entire cluster. This provides an </span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">eventually consistent</span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"> backup. Although no one node is guaranteed to be consistent with its replica nodes at the time a snapshot is taken, a restored snapshot can resume consistency using Cassandra's built-in consistency mechanisms.</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">After a system-wide snapshot has been taken, you can enable incremental backups on each node (disabled by default) to backup data that has changed since the last snapshot was taken. Each time an SSTable is flushed, a hard link is copied into a /backups subdirectory of the data directory.</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap"> </span></span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Now lets see how can we deal with snapshot once its taken. Below you can see a list of command that needs to be executed to prepare a snapshot:</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">    </span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap">       </span></span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Flushing SSTables for consistency</span></p>

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">    </span><span style="font-size:16px;font-family:'Courier New';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap">   </span></span><span style="font-size:16px;font-family:'Courier New';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">'nodetool flush'</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">    </span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap">       </span></span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Creating snapshots (for example of all keyspaces)</span></p>

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">    </span><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap"> </span></span><span style="font-size:16px;font-family:'Courier New';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">"nodetool snapshot -t %(backup_name)s 1>/dev/null",</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">where</span></p>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">backup_name - is a name of snapshot</span></p></li></ul><br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify">

<span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Once it’s done we would need to collect all hard links into a common directory (with keeping initial file hierarchy):</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:16px;font-family:'Courier New';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">sudo tar cpzfP /tmp/all_ks.tar.gz\</span></p>

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:16px;font-family:'Courier New';color:rgb(0,0,0);font-style:italic;vertical-align:baseline;white-space:pre-wrap;background-color:transparent"> $(sudo find %(datadir)s -type d -name %(backup_name)s)"</span></p>

<br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;margin-left:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">where</span></p>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">backup_name - is a name of snapshot,</span></p></li><li dir="ltr" style="list-style-type:disc;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent">

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">datadir - storage location (/var/lib/cassandra/data, by the default)</span></p>

</li></ul><br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent"><span style="white-space:pre-wrap">   </span></span></p>

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-indent:36pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Note that this operation can be extended:</span></p>

<ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">if cassandra was launched with more than one data directory (see </span><a href="http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html" style="text-decoration:none" target="_blank"><span style="text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">cassandra.yaml</span></a><span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">)</span></p>

</li><li dir="ltr" style="list-style-type:disc;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">if we want to backup only:</span></p></li><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:circle;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent">

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">certain keyspaces at the same time</span></p>

</li><li dir="ltr" style="list-style-type:circle;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent"><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify">

<span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">one keyspace</span></p></li><li dir="ltr" style="list-style-type:circle;font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);vertical-align:baseline;background-color:transparent">

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="vertical-align:baseline;white-space:pre-wrap;background-color:transparent">a list of CF’s for given keyspace</span></p>

</li></ul></ul><br><p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><span style="font-size:19px;font-family:'Times New Roman';color:rgb(0,0,0);text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">Useful links</span></p>

<p dir="ltr" style="line-height:1.15;margin-top:0pt;margin-bottom:0pt;text-align:justify"><a href="http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html" style="text-decoration:none" target="_blank"><span style="font-size:13px;font-family:'Times New Roman';text-decoration:underline;vertical-align:baseline;white-space:pre-wrap;background-color:transparent">http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html</span></a></p>

<br>Best regards,</span><div><span>Denis Makogon</span></div></div>
<br>_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr"><div><div>Best regards,<br></div>Dmitriy Ukhlov<br></div>Mirantis Inc.<br></div>
</div>