[openstack-dev] [heat] Re: deliver the vm-level HA to improve the business continuity with openstack
Qiming Teng
tengqim at linux.vnet.ibm.com
Tue Apr 15 10:16:16 UTC 2014
What I saw in this thread are several topics:
1) Is VM HA really relevant (in a cloud)?
This is the most difficult question to answer, because it really depends
on who you are talking to, who are the user community you are facing.
IMHO, for most web-based applications that are born to run on cloud,
maybe certain level of business resiliency has already been built into
the code, so the application or service can live happily when VMs come
and go.
For traditional business applications, the scenario may be quite
different. These apps are migrated to cloud for reasons like cost
savings, server consolidation, etc.. Quite some companies are
evaluating OpenStack for their "private cloud" -- which is a weird term,
IMHO.
In addition to this, while we are looking into the 'utility' vision of
cloud, we can still ask ourselves: a) can we survive one month of power
outage or water outage, though there are abundant supply elsewhere on
this
planet? b) what are the costs we need to pay if we eventually make it?
c) do we want to pay for this?
My personal experience is that our customers really want this feature
(VM HA) for their private clouds. The question they asked us was:
"
Does OpenStack support VM HA? Maybe not for all VMS...
We know we can have that using vSphere, Azure, or CloudStack...
"
2) Where is the best location to provide VM HA?
Suppose that we do feel the need to support VM HA, then the questions
following this would 'where' and 'how'.
Considering that a VM is not merely a bundle of compute processes, it is
actually a virtual execution environment that consumes resources like
storage and network bandwidth besides processor cycles, Nova may be NOT
the ideal location to deal with this cross-cutting concern.
High availability involves redundant resource provisioning, effective
failure detection and appropriate fail-over policies, including fencing.
Imposing all these requirements on Nova is impractical. We may need to
consider whether VM HA, if ever implemented/supported, should be part of
the orchestration service, aka Heat.
3) Can/should we do the VM HA orchestration in Heat?
My perception is that it can be done in Heat, based on my limited
understandig of how Heat works. It may imply some requirements to other
projects (e.g. nova, cinder, neutron ...) as well, though Heat should be
the orchestrator.
What do we need then?
- A resource type for VM groups/clusters, for the redundant
provisioning. VMs in the group can be identical instances, managed
by a Pacemaker setup among the VMs, just like a WatchRule in Heat can
be controlled by Ceilometer.
Another way to do this is to have the VMs monitored via heartbeat
messages sent by Nova (if possible/needed), or some services injected
into the VMs (consider what cfn-hup, cfn-signal does today).
However, the VM group/cluster can decide how to react to a VM online
/offline signal. It may choose to a) restart the VM in-place; b)
remote-restart (aka evacuate) the VM somewhere else; c) live/cold
migrate the VM to other nodes.
The policies can be out sourced to other plugins considering that
global load-balancing or power management requirements. But that is an
advanced feature that warrants another blueprint.
- Some fencing support from nova, cinder, neutron to shoot the bad VMs
in the head so a VM that cannot be reached is guarantteed to be cleanly
killed.
- VM failure detectors that can reliably tell whether a VM has failed.
Sometimes a VM that failed the expected performance goal should be
treated as failed as well, if we really want to be strict on this.
A failure detector can reside inside Nova, as what has been done for
the 'service groups' there. It can reside inside a VM, as a service
istalled there, sending out heatbeat messages (before the battery runs
out, :))
- A generic signaling mechanism that allows a secure message delivery
back to Heat indicating that a VM is alive or dead.
My current understanding is that we may avoid complicated task-flow
here.
Regards,
- Qiming
> >>For the most part we've been trying to encourage projects that want to
> >>control VMs to add such functionality to the Orchestration program, aka
> >>"Heat".
> >Yes, exactly.
> >
> >-jay
> >
> Hey folks,
>
> Just as a note for HA for VMs, our current heat-core thinking is our
> HARestarter resource functionality is a workflow (Restarter is a
> verb, rather then a Noun - Heat orchestrates Nouns) and would be
> better suited to a workflow service like Mistral. Clearly we don't
> know how to get from where we are today to the proper separation of
> concerns as pointed out by Zane Bitter in recent threads on the ml
> but just throwing this out there so folks are aware.
>
> Regards
> -steve
>
More information about the OpenStack-dev
mailing list