<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}

o\:* {behavior:url(#default#VML);}

w\:* {behavior:url(#default#VML);}

.shape {behavior:url(#default#VML);}

</style><![endif]--><style><!--

/* Font Definitions */

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

tt

        {mso-style-priority:99;

        font-family:"Courier New";}

p.MsoAcetate, li.MsoAcetate, div.MsoAcetate

        {mso-style-priority:99;

        mso-style-link:"Balloon Text Char";

        margin:0in;

        margin-bottom:.0001pt;

        font-size:8.0pt;

        font-family:"Tahoma","sans-serif";}

p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph

        {mso-style-priority:34;

        margin-top:0in;

        margin-right:0in;

        margin-bottom:0in;

        margin-left:.5in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";}

span.BalloonTextChar

        {mso-style-name:"Balloon Text Char";

        mso-style-priority:99;

        mso-style-link:"Balloon Text";

        font-family:"Tahoma","sans-serif";}

span.EmailStyle20

        {mso-style-type:personal-reply;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri","sans-serif";}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

/* List Definitions */

@list l0

        {mso-list-id:1919898538;

        mso-list-type:hybrid;

        mso-list-template-ids:-30247760 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}

@list l0:level1

        {mso-level-text:"%1\)";

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;}

@list l0:level2

        {mso-level-number-format:alpha-lower;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;}

@list l0:level3

        {mso-level-number-format:roman-lower;

        mso-level-tab-stop:none;

        mso-level-number-position:right;

        text-indent:-9.0pt;}

@list l0:level4

        {mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;}

@list l0:level5

        {mso-level-number-format:alpha-lower;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;}

@list l0:level6

        {mso-level-number-format:roman-lower;

        mso-level-tab-stop:none;

        mso-level-number-position:right;

        text-indent:-9.0pt;}

@list l0:level7

        {mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;}

@list l0:level8

        {mso-level-number-format:alpha-lower;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        text-indent:-.25in;}

@list l0:level9

        {mso-level-number-format:roman-lower;

        mso-level-tab-stop:none;

        mso-level-number-position:right;

        text-indent:-9.0pt;}

ol

        {margin-bottom:0in;}

ul

        {margin-bottom:0in;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>HI, Deepak. With the caveat that both the etherpad and Ron’s presentation are pretty high-level, my guess is:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><span style='mso-list:Ignore'>1)<span style='font:7.0pt "Times New Roman"'>      </span></span></span><![endif]><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>“DR middleware” refers to the orchestration engine managing the entire DR process between the primary and secondary sites. (Something like two Heat workflows interacting or a workflow that works across multiple OpenStack deployments.) The replication agent is what does what resembles continually cloning a volume from the primary to the secondary, with snapshots appearing on the secondary at times when the volumes contents are application-consistent and consistent with each other (for all the volumes of a VM or a multi-tier app). These secondary-site snapshots appear at specified rates (so you know how recent your oldest snapshots there will be). For instance, the replication agent might do some sort of snapshot(s) on the primary and then it updates the corresponding volume(s) on the secondary using the primary snapshot(s). This resembles (maybe it could even be) something like DRBD or NBD. Many SAN vendors provide some form of replication agent between SANs.<o:p></o:p></span></p><p class=MsoListParagraph><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><span style='mso-list:Ignore'>2)<span style='font:7.0pt "Times New Roman"'>      </span></span></span><![endif]><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Regarding metadata, the replication agent might only be replicating the volumes of some tenant VMs. It might not be replicating any volumes containing OpenStack metadata. (This is for the smaller tenant use-case, not complete OpenStack deployment mirroring, or somesuch. If complete mirroring was done, maybe you wouldn’t have to sync metadata if you designed the system just for that). DR is often something that a tenant might apply only to a set of core servers (key pets).  In this use-case the two (or more DR sites) might not be symmetrical. The secondary site needs to know it is in the secondary role. Things like IP addresses, maybe security and firewall rules, might have to change for the workload to run at the secondary site. Applying this metadata to VMs on the secondary site (what needs to change in the personality), when they boot, is probably something Heat can do. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>-bruce<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Deepak Shetty [mailto:dpkshetty@gmail.com] <br><b>Sent:</b> Wednesday, March 19, 2014 11:54 PM<br><b>To:</b> OpenStack Development Mailing List (not for usage questions)<br><b>Subject:</b> Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder - discussion reminder<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><div><div><div><div><p class=MsoNormal>Hi List,<o:p></o:p></p></div><p class=MsoNormal style='margin-bottom:12.0pt'>    I was looking at the etherpad and March 19 notes and have few Qs<o:p></o:p></p></div><p class=MsoNormal style='margin-bottom:12.0pt'>1) How is the "DR middleware" (depicted in Ron's youtube video) different than the "replication agent" (noted in the March 19 etherpad notes). Are they same, if not, how/why are they different ?<o:p></o:p></p></div><p class=MsoNormal style='margin-bottom:12.0pt'>2) Maybe a dumb Q.. but still.. Why do we need to worry about syncing metadata differently ? If all the storage that is used across openstack services (and in typical case it might be just 1 backend, say GlsuterFS) are beign replicated durign the DR, wouldn't the metadata be replicated too.. why do we need to be concerned abt it as a separate entity ?<br><br>thanx,<br>deepak<o:p></o:p></p></div><div><p class=MsoNormal style='margin-bottom:12.0pt'><o:p> </o:p></p><div><p class=MsoNormal>On Wed, Mar 19, 2014 at 2:11 PM, Ronen Kat <<a href="mailto:RONENKAT@il.ibm.com" target="_blank">RONENKAT@il.ibm.com</a>> wrote:<o:p></o:p></p><p class=MsoNormal><span style='font-family:"Arial","sans-serif"'>For those who are interested we will discuss the disaster recovery use-cases and how to proceed toward the Juno summit on March 19 at 17:00 UTC (invitation below)</span> <br><br><br><br><span style='font-family:"Arial","sans-serif"'>Call-in: </span><a href="https://www.teleconference.att.com/servlet/glbAccess?process=1&accessCode=6406941&accessNumber=1809417783#C2" target="_blank">https://www.teleconference.att.com/servlet/glbAccess?process=1&accessCode=6406941&accessNumber=1809417783#C2</a> <br><span style='font-family:"Arial","sans-serif"'>Passcode: 6406941</span> <br><br>Etherpad: <a href="https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders" target="_blank">https://etherpad.openstack.org/p/juno-disaster-recovery-call-for-stakeholders</a> <br>Wiki: <a href="https://wiki.openstack.org/wiki/DisasterRecovery" target="_blank">https://wiki.openstack.org/wiki/DisasterRecovery</a> <br><br>Regards, <br>__________________________________________ <br><span style='font-family:"Arial","sans-serif";color:#8F8F8F'>Ronen I. Kat, PhD</span> <br><span style='font-family:"Arial","sans-serif"'>Storage Research</span> <br><b><span style='font-family:"Arial","sans-serif";color:#4181C0'>IBM Research - Haifa</span></b> <br><span style='font-family:"Arial","sans-serif";color:#4181C0'>Phone:</span><span style='font-family:"Arial","sans-serif";color:#5F5F5F'> +972.3.7689493</span> <br><span style='font-family:"Arial","sans-serif";color:#4181C0'>Email</span>: <span style='font-family:"Arial","sans-serif";color:#5F5F5F'><a href="mailto:ronenkat@il.ibm.com" target="_blank">ronenkat@il.ibm.com</a></span> <br><br><br><br><br><span style='font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F'>From:        </span><span style='font-size:7.5pt;font-family:"Arial","sans-serif"'>"Luohao (brian)" <<a href="mailto:brian.luohao@huawei.com" target="_blank">brian.luohao@huawei.com</a>></span> <br><span style='font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F'>To:        </span><span style='font-size:7.5pt;font-family:"Arial","sans-serif"'>"OpenStack Development Mailing List (not for usage questions)" <<a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a>>, </span><br><span style='font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F'>Date:        </span><span style='font-size:7.5pt;font-family:"Arial","sans-serif"'>14/03/2014 03:59 AM</span> <br><span style='font-size:7.5pt;font-family:"Arial","sans-serif";color:#5F5F5F'>Subject:        </span><span style='font-size:7.5pt;font-family:"Arial","sans-serif"'>Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder</span> <o:p></o:p></p><div class=MsoNormal align=center style='text-align:center'><hr size=2 width="100%" noshade style='color:#A0A0A0' align=center></div><p class=MsoNormal style='margin-bottom:12.0pt'><br><br><br><tt><span style='font-size:10.0pt'>1.  fsfreeze with vss has been added to qemu upstream, see </span></tt><a href="http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html" target="_blank"><tt><span style='font-size:10.0pt'>http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg01963.html</span></tt></a><tt><span style='font-size:10.0pt'> for usage.</span></tt><span style='font-size:10.0pt;font-family:"Courier New"'><br><tt>2.  libvirt allows a client to send any commands to qemu-ga, see </tt></span><a href="http://wiki.libvirt.org/page/Qemu_guest_agent" target="_blank"><tt><span style='font-size:10.0pt'>http://wiki.libvirt.org/page/Qemu_guest_agent</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><tt>3.  linux fsfreeze is not equivalent to windows fsfreeze+vss. Linux fsreeze offers fs consistency only, while windows vss allows agents like sqlserver to register their plugins to flush their cache to disk when a snapshot occurs.</tt><br><tt>4.  my understanding is xenserver does not support fsfreeze+vss now, because xenserver normally does not use block backend in qemu.</tt><br><br><tt>-----Original Message-----</tt><br><tt>From: Bruce Montague [</tt></span><a href="mailto:Bruce_Montague@symantec.com" target="_blank"><tt><span style='font-size:10.0pt'>mailto:Bruce_Montague@symantec.com</span></tt></a><tt><span style='font-size:10.0pt'>] </span></tt><span style='font-size:10.0pt;font-family:"Courier New"'><br><tt>Sent: Thursday, March 13, 2014 10:35 PM</tt><br><tt>To: OpenStack Development Mailing List (not for usage questions)</tt><br><tt>Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder</tt><br><br><tt>Hi, about OpenStack and VSS. Does anyone have experience with the qemu project's implementation of VSS support? They appear to have a within-guest agent, qemu-ga, that perhaps can work as a VSS requestor. Does it also work with KVM? Does qemu-ga work with libvirt (can VSS quiesce be triggered via libvirt)? I think there was an effort for qemu-ga to use fsfreeze as an equivalent to VSS on Linux systems, was that done?  If so, could an OpenStack API provide a generic quiesce request that would then get passed to libvirt? (Also, the XenServer VSS support seems different than qemu/KVM's, is this true? Can it also be accessed through libvirt?</tt><br><br><tt>Thanks,</tt><br><br><tt>-bruce</tt><br><br><tt>-----Original Message-----</tt><br><tt>From: Alessandro Pilotti [</tt></span><a href="mailto:apilotti@cloudbasesolutions.com" target="_blank"><tt><span style='font-size:10.0pt'>mailto:apilotti@cloudbasesolutions.com</span></tt></a><tt><span style='font-size:10.0pt'>]</span></tt><span style='font-size:10.0pt;font-family:"Courier New"'><br><tt>Sent: Thursday, March 13, 2014 6:49 AM</tt><br><tt>To: <a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.org</a></tt><br><tt>Subject: Re: [openstack-dev] Disaster Recovery for OpenStack - call for stakeholder</tt><br><br><tt>Those use cases are very important in enterprise scenarios requirements, but there's an important missing piece in the current OpenStack APIs: support for application consistent backups via Volume Shadow Copy (or other solutions) at the instance level, including differential / incremental backups.</tt><br><br><tt>VSS can be seamlessly added to the Nova Hyper-V driver (it's included with the free Hyper-V Server) with e.g. vSphere and XenServer supporting it as well (quescing) and with the option for third party vendors to add drivers for their solutions.</tt><br><br><tt>A generic Nova backup / restore API supporting those features is quite straightforward to design. The main question at this stage is if the OpenStack community wants to support those use cases or not. Cinder backup/restore support [1] and volume replication [2] are surely a great starting point in this direction.</tt><br><br><tt>Alessandro</tt><br><br><tt>[1] </tt></span><a href="https://review.openstack.org/#/c/69351/" target="_blank"><tt><span style='font-size:10.0pt'>https://review.openstack.org/#/c/69351/</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><tt>[2] </tt></span><a href="https://review.openstack.org/#/c/64026/" target="_blank"><tt><span style='font-size:10.0pt'>https://review.openstack.org/#/c/64026/</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><br><br><tt>> On 12/mar/2014, at 20:45, "Bruce Montague" <<a href="mailto:Bruce_Montague@symantec.com" target="_blank">Bruce_Montague@symantec.com</a>> wrote:</tt><br><tt>></tt><br><tt>></tt><br><tt>> Hi, regarding the call to create a list of disaster recovery (DR) use cases ( </tt></span><a href="http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html" target="_blank"><tt><span style='font-size:10.0pt'>http://lists.openstack.org/pipermail/openstack-dev/2014-March/028859.html</span></tt></a><tt><span style='font-size:10.0pt'> ), the following list sketches some speculative OpenStack DR use cases. These use cases do not reflect any specific product behavior and span a wide spectrum. This list is not a proposal, it is intended primarily to solicit additional discussion. The first basic use case, (1), is described in a bit more detail than the others; many of the others are elaborations on this basic theme.</span></tt><span style='font-size:10.0pt;font-family:"Courier New"'><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (1) [Single VM]</tt><br><tt>></tt><br><tt>> A single Windows VM with 4 volumes and VSS (Microsoft's Volume Shadowcopy Services) installed runs a key application and integral database. VSS can quiesce the app, database, filesystem, and I/O on demand and can be invoked external to the guest.</tt><br><tt>></tt><br><tt>>   a. The VM's volumes, including the boot volume, are replicated to a remote DR site (another OpenStack deployment).</tt><br><tt>></tt><br><tt>>   b. Some form of replicated VM or VM metadata exists at the remote site. This VM/description includes the replicated volumes. Some systems might use cold migration or some form of wide-area live VM migration to establish this remote site VM/description.</tt><br><tt>></tt><br><tt>>   c. When specified by an SLA or policy, VSS is invoked, putting the VM's volumes in an application-consistent state. This state is flushed all the way through to the remote volumes. As each remote volume reaches its application-consistent state, this is recognized in some fashion, perhaps by an in-band signal, and a snapshot of the volume is made at the remote site. Volume replication is re-enabled immediately following the snapshot. A backup is then made of the snapshot on the remote site. At the completion of this cycle, application-consistent volume snapshots and backups exist on the remote site.</tt><br><tt>></tt><br><tt>>   d.  When a disaster or firedrill happens, the replication network </tt><br><tt>> connection is cut. The remote site VM pre-created or defined so as to use the replicated volumes is then booted, using the latest application-consistent state of the replicated volumes. The entire VM environment (management accounts, networking, external firewalling, console access, etc..), similar to that of the primary, either needs to pre-exist in some fashion on the secondary or be created dynamically by the DR system. The booting VM either needs to attach to a virtual network environment similar to at the primary site or the VM needs to have boot code that can alter its network personality. Networking configuration may occur in conjunction with an update to DNS and other networking infrastructure. It is necessary for all required networking configuration  to be pre-specified or done automatically. No manual admin activity should be required. Environment requirements may be stored in a DR configuration o r database associated with the replication.</tt><br><tt>></tt><br><tt>>   e. In a firedrill or test, the virtual network environment at the remote site may be a "test bubble" isolated from the real network, with some provision for protected access (such as NAT). Automatic testing is necessary to verify that replication succeeded. These tests need to be configurable by the end-user and admin and integrated with DR orchestration.</tt><br><tt>></tt><br><tt>>   f. After the VM has booted and been operational, the network </tt><br><tt>> connection between the two sites is re-established. A replication </tt><br><tt>> connection between the replicated volumes is restablished, and the </tt><br><tt>> replicated volumes are re-synced, with the roles of primary and </tt><br><tt>> secondary reversed. (Ongoing replication in this configuration may </tt><br><tt>> occur, driven from the new primary.)</tt><br><tt>></tt><br><tt>>   g. A planned failback of the VM to the old primary proceeds similar to the failover from the old primary to the old replica, but with roles reversed and the process minimizing offline time and data loss.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (2) [Core tenant/project infrastructure VMs]</tt><br><tt>></tt><br><tt>> Twenty VMs power the core infrastructure of a group using a private cloud (OpenStack in their own datacenter). Not all VMs run Windows with VSS, some run Linux with some equivalent mechanism, such as qemu-ga, driving fsfreeze and signal scripts. These VMs are replicated to a remote OpenStack deployment, in a fashion similar to (1). Orchestration occurring at the remote site on failover is more complex (correct VM boot order is orchestrated, DHCP service is configured as expected, all IPs are made available and verified). An equivalent virtual network topology consisting of multiple networks or subnets might be pre-created or dynamically created at failover time.</tt><br><tt>></tt><br><tt>>   a. Storage for all volumes of all VMs might be on a single storage backend (logically a single large volume containing many smaller sub-volumes, examples being a VMware datastore or Hyper-V CSV). This entire large volume might be replicated between similar storage backends at the primary and secondary site. A single replicated large volume thus replicates all the tenant VM's volumes. The DR system must trigger quiesce of all volumes to application-consistent state.</tt><br><tt>></tt><br><tt>>   b. This environment needs to deal with failures of the primary datacenter (as when a trenching tool cuts its connection to the internet), routine firedrill tests that perform failover and failback, and planned migration.</tt><br><tt>></tt><br><tt>>   c. VSS or fsfreeze may be expected to fail for some VMs and policies and SLAs need to contend with this and alert admins for manual follow-up.</tt><br><tt>></tt><br><tt>>   d. Network bandwidth used for replication needs to be throttled so as not to overly disrupt the private cloud's gateway capacity.</tt><br><tt>></tt><br><tt>>   e. DR replication needs to deal with intermittent network replication failure and recover gracefully. In case of a known network issue, such as maintenance, it needs to be possible for the admin to explicitly suspend network replication. Replication I/O is then logged locally at the primary site in some fashion. The remote site needs to stay replication ready, but failover does not occur. When the network issue is over, replication resumes, perhaps recovering via a log, a map of updated blocks, or an equivalent technique. In this example the RPO window is deliberately ignored and allowed to grow until replication is resumed by the admin.</tt><br><tt>></tt><br><tt>>   f. This tenant requires encryption of network replication traffic.</tt><br><tt>></tt><br><tt>>   g. Cost accounting and chargeback is required.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (3) [Multi-tier app infrastructure]</tt><br><tt>></tt><br><tt>> A tenant has a service consisting of 8 multi-tier apps that each consist of 3 to 5 VMs, with each VM having 2 to 4 disks. Replication snapshots need to be made of the volumes in an application-consistent way across all the volumes of all the VMs in all the multi-tier apps. Again, these volumes may exist on a single large volume or datastore, perhaps simplifying creation of the cross-VM application consistency snapshot. Not all of the VMs in a multi-tier app may need to be quiesced, some may be stateless and simply need to be recovered to a running state.</tt><br><tt>></tt><br><tt>> a. This tenant requires that 3 of the multi-tier apps failover to one remote OpenStack site and the other 5 multi-tier apps failover to a different remote site than the first.</tt><br><tt>></tt><br><tt>> b. This tenant weekly performs a non-disruptive test-bubble failover test. Real failover is not triggered. Instead, all the multi-tier app VMs that would boot upon failure are booted (from their latest snapshots on the secondary), but the VM's virtual network environment on the secondary is isolated from external networking. Test bubbles at the two OpenStack remote sites may need to be connected via some VPN/tunnel or equivalent without manual admin activity.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (4) [Tenant failover]</tt><br><tt>></tt><br><tt>> An OpenStack tenant has 40 VMs, relatively lightly loaded, used for development. The VMs do not contain VSS, qemu-ga, or standard tools (they may be running any Linux distro, some may be running Plan9, the tenant may be doing Linux kernel development (that is, the VMs can be anything)). A remote OpenStack deployment needs to exist so that in event of loss of the primary OpenStack site, the tenant can continue development. In addition to volume replication as in (1), subject to policies and SLAs, cold migration may be performed on a VM's volumes upon shutdown (or dismount) and tenant end-users can explicitly request replication of a volume that is in an application-consistent state (when they have quiesced it by VSS, dismount, or equivalent).</tt><br><tt>></tt><br><tt>> a. Being down for a short period may be acceptable to this tenant. If all the hosts on the primary site are rebooted, for instance, due to power failure, it is the operators choice to fail over or not. If the operator chooses not to fail over, upon reboot of the VM's at the primary site, any established replication should automatically be continued.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (5) [Scale-out workload]</tt><br><tt>></tt><br><tt>> A tenant has a Cassandra (or Hadoop or similar type of system) consisting of 75 VMs. Use is bursty. The system is used by a pharmaceutical company for design work. Loss of a week's work can be repeated, but weekly replication is mandatory. The application itself may provide some form of built-in geo-replication. Some controller-type VMs may need to be replicated as in (1). Other VMs may partner with replica VMs for explicit application data replication. For weekly replication of Cassandra data, Cassandra user-level snapshots are made into replicated volumes attached to each Cassandra VM. Replication is periodic with respect to the last replication event, that is, only data changed since the last replication event is sent.</tt><br><tt>></tt><br><tt>>   a. The tenant requires use of a particular aggregated network link for replication.</tt><br><tt>></tt><br><tt>>   b. The tenant requires custom integration with the DR replication workflow to quiesce Cassandra via user-level commands and scripts developed by the end-user.</tt><br><tt>></tt><br><tt>>   c. Initial synchronization of replicated primary and secondary volume need not be over a network link. Secondary volumes can be created initially from physical disks or backups physically moved to the secondary site.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (6) [Degraded-mode Mission-critical single VM]</tt><br><tt>></tt><br><tt>> This single VM use case is similar to (1), but when a network </tt><br><tt>> partition occurs between the primary and secondary OpenStack sites, </tt><br><tt>> with both sites remaining up, the primary VM remains operational while </tt><br><tt>> the secondary replica VM also comes online. Both VMs operate in a mode </tt><br><tt>> that resembles replication with a momentary network fault, logging </tt><br><tt>> their would-be replication traffic for continuation when the network </tt><br><tt>> comes back. When network connectivity is reestablished, one site again </tt><br><tt>> becomes the primary and differences in the VM's volumes can optionally </tt><br><tt>> (as controlled by policy) be reconciled. (In a simple case, each site </tt><br><tt>> might have its own dedicated volume partition or attached volume with </tt><br><tt>> its latest state.)</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (7) [Self-contained application volume]</tt><br><tt>></tt><br><tt>> A cinder volume contains a complete database application, including the database and all binaries and configuration files. Replication of the entire VM to which this volume is attached is not needed. The VM and  its configuration can be recreated on demand at the remote site and attached to the replicated application volume. The DR system still needs to orchestrate the process and create or manage the required network environment. A simple DR strategy can be used in which the volume is quiesced on the primary, a volume snapshot taken, the volume unquiesced (enabling the VM to continue running), and a backup is then made of the snapshot. Backups can be transported by whatever means to the DR site, where the volume can be restored to its state at time of snapshot.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (8) [Stateless]</tt><br><tt>></tt><br><tt>> No volumes and VMs need to be replicated, as VMs and their configuration can be recreated on demand, using configuration tools, and application data is accessed over the wide-area network (NFS or object store). The DR process still has to orchestrate creating the VMs, running configuration tools to populate them, creating the network environment, and booting VMs in required order.</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> * (9) [Site Evacuation]</tt><br><tt>></tt><br><tt>> The holy grail, automatic planned migration of the workload and data from one cloud-scale datacenter to another (or a set of others). In practice, likely to include admins in-the-loop. At both tenant-scale and entire datacenter scale. The entire cloud datacenter is expected to go offline for an extended period (the hurricane scenario).</tt><br><tt>></tt><br><tt>></tt><br><tt>></tt><br><tt>> -bruce</tt><br><tt>></tt><br><tt>></tt><br><tt>> _______________________________________________</tt><br><tt>> OpenStack-dev mailing list</tt><br><tt>> <a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a></tt><br><tt>> </tt></span><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank"><tt><span style='font-size:10.0pt'>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><br><tt>_______________________________________________</tt><br><tt>OpenStack-dev mailing list</tt><br><tt><a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a></tt><br></span><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank"><tt><span style='font-size:10.0pt'>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><br><tt>_______________________________________________</tt><br><tt>OpenStack-dev mailing list</tt><br><tt><a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a></tt><br></span><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank"><tt><span style='font-size:10.0pt'>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><br><tt>_______________________________________________</tt><br><tt>OpenStack-dev mailing list</tt><br><tt><a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.org</a></tt><br></span><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank"><tt><span style='font-size:10.0pt'>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</span></tt></a><span style='font-size:10.0pt;font-family:"Courier New"'><br><br></span><br><br>_______________________________________________<br>OpenStack-dev mailing list<br><a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><o:p></o:p></p></div><p class=MsoNormal><o:p> </o:p></p></div></div></body></html>