List,

I am in the process of  setting up  a DC and DR sites  with openstack(ussuri, qemu kvm with debian as base OS).

I would like to know what all are the options in  DC and DR sites for High availability .

What the architecture methods to follow at DC and DR so that,  If VM crashes at  one physical machine at DC ( for example 3 controllers in DC), or  Host machine crashes( RAM, Hardware failure etc. ) /  machine power cable detached accidentally  so that no services down / VMs  and applications not down. 

How can achieve this ? share the best practices here. 

Also  If I take snapshots of the running VMs ( what will the user experience ? will they in freeze / logged out from applications right now they are logged in ? ) . Can we avoid the service unavailable for users while taking snapshots ?


To DR we are rsyncing these snapshots after converting  each VM image to qcow2 then rsyncing .  ( All these we are performing on Controller node . Generally 100 GB VM will output 16 GB qcow2 image for example.  )  its taking 2 minutes for snapshot creation and 20 to 30 Minutes for qcow2 conversion . )

How many snapshots and conversion to qcow2( qemu-img convert) can be performed on this controller machine where we performing this operation . ( Can we apply parallel processing for this and how ? ) .  Seet he controller specs where we perform this. 

The controller spec  ( 20 core CPUs with 1.5 TB RAM total 160 processors )

 Then copying to DR ( which is 200 KM away from our DC takes 4 minutes with rsync).

Then we populate each VM with these copied qcow2 images and attaching the Network ips and nat there.   Is this the practice or any other good way to perform this for a robust DC and DR setup . 

Kindly share your thoughts. 

Regards,
Kris