List,

We are having an old cloud setup with OpenStack  Ussuri usng Debian OS, (Qemu KVM ).  I know its very old and we can't upgrade to to new versions right now.

The  Deployment is as follows.

A.    3 Controller in (cum compute nodes . VMs are running on controllers too..) in HA mode.

B.   6 separate Compute nodes

C.    3 separate Storage node with Ceph RBD

Question is

1.  In case of any Sudden Hardware failure of one  or more controller node OR Compute node  OR Storage Node  what will be the immediate redundant  recovery setup need to be employed ?

2.  In case H/W failure our  recovery need to as soon as possible. For example less than30 Minutes after the first failure occurs.

3.  Is there setup options like a hot standby or similar setups or what  we need to employ ?

4. To meet all   RTO (< 30 Minutes down time ) and RPO(from the exact point of crash all applications and data must be consistent) . 

5. Please share  your thoughts for reliable crash/fault resistance configuration options in DC.


We  have   a remote DR setup right now in a remote location. Also I would like to know if there is a recommended way to make the remote DR site Automatically up and run  ? OR How to automate the service from DR site  to  meet exact RTO and RPO 

Any thoughts most welcom.

Regards,
Krish