[Openstack] Hardware HA

Caitlin Bestler Caitlin.Bestler at nexenta.com
Thu Nov 10 20:59:10 UTC 2011


Ryan Lane Wrote in response to Soren Hansen:

>> That's the whole point. For most interesting applications, "fast"
>> automatic migration isn't anywhere near fast enough. Don't try to 
>> avoid failure. Expect it and design around it.
>>

>This assumes all application designers are doing this. Most web applications do
>this fairly well, but most enterprise applications do this very poorly.

>Hardware HA is useful for more than just poorly designed applications though.
> I have a cloud instance that runs my personal website. I don't want to pay for
>two (or more, realistically) instances just to ensure that if my host dies that my
>site will continue to run. My provider should automatically detect the hardware
>failure and re-launch my instance on another piece of hardware; it should also
>notify me that it happened, but that's a different story ;).

There are techniques to migrate VMs between non-HA hosts, and there are
techniques that allow applications to be written so that any instance of the server
can be lost without impairing the application (you just start a new instance of the
server, rather than migrating the server).

But neither of those solve the problem as well has hardware High Availability.
Whether Hardware HA is a cost effective solution is something that customers
will ultimately have to determine. 

A successful proposal would need to include identifying when a VM wants/needs
to be hosted on a Hardware-HA enhanced host, a method of identifying the 
Hardware-HA enhanced hosts, and the ability to track when a Hardware-HA
Host is in degraded mode (i.e., it currently is one resource failure away from
an absolute failure).

I think those features can be designed in a way that does not impose too strong
of a burden on the core scheduling algorithm, as long as it isn't required to
evaluate a long list of "Hardware HA QoS metrics" to do optimal guest to host
assignments.

This is actually virtually the same issue as Object Storage support for self-healing
Mirroring (via ZFS) that we have proposed for Swift. It defines an enhanced capability
For specific servers that can be characterized in a way that the generic  control and
Management plane algorithms can understand. The hardest part of that understanding
In both cases is the addition of a "degraded" status for a server.

Without Hardware HA or self-healing mirroring a host/data server is either "up" or "down".
With Hardware HA and self-healing mirroring they can be "degraded". The Hardware HA
Host can be down to a single hardware node. The self-healing mirror could be done to a
Single working storage device. In either case the remaining copy is still functional, but you
Probably want to begin migrating the VMs/Swift Partitions elsewhere (unless your mean
Time to repair is really good).





More information about the Openstack mailing list