Data Center Survival in case of Disaster / HW Failure in DC

Thu May 5 09:16:11 UTC 2022


We are having an old cloud setup with OpenStack  Ussuri usng Debian OS,
(Qemu KVM ).  I know its very old and we can't upgrade to to new versions
right now.

The  Deployment is as follows.

A.    3 Controller in (cum compute nodes . VMs are running on controllers
too..) in HA mode.

B.   6 separate Compute nodes

C.    3 separate Storage node with Ceph RBD

Question is

1.  In case of any Sudden Hardware failure of one  or more controller node
OR Compute node  OR Storage Node  what will be the immediate redundant
recovery setup need to be employed ?

2.  In case H/W failure our  recovery need to as soon as possible. For
example less than30 Minutes after the first failure occurs.

3.  Is there setup options like a hot standby or similar setups or what  we
need to employ ?

4. To meet all   RTO (< 30 Minutes down time ) and RPO(from the exact point
of crash all applications and data must be consistent) .

5. Please share  your thoughts for reliable crash/fault resistance
configuration options in DC.

We  have   a remote DR setup right now in a remote location. Also I would
like to know if there is a recommended way to make the remote DR site
Automatically up and run  ? OR How to automate the service from DR site
to  meet exact RTO and RPO

Any thoughts most welcom.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the openstack-discuss mailing list