[openstack-dev] [heat][nova] VM restarting on host, failure in convergence

Jastrzebski, Michal michal.jastrzebski at intel.com
Fri Sep 19 08:48:46 UTC 2014


 > > All,
 > >
 > > Currently OpenStack does not have a built-in HA mechanism for tenant
 > > instances which could restore virtual machines in case of a host
 > > failure. Openstack assumes every app is designed for failure and can
 > > handle instance failure and will self-remediate, but that is rarely
 > > the case for the very large Enterprise application ecosystem.
 > > Many existing enterprise applications are stateful, and assume that
 > > the physical infrastructure is always on.
 > >
 >
 > There is a fundamental debate that OpenStack's vendors need to work out
 > here. Existing applications are well served by existing virtualization
 > platforms. Turning OpenStack into a work-alike to oVirt is not the end
 > goal here. It's a happy accident that traditional apps can sometimes be
 > bent onto the cloud without much modification.
 >
 > The thing that clouds do is they give development teams a _limited_
 > infrastructure that lets IT do what they're good at (keep the
 > infrastructure up) and lets development teams do what they're good at 
(run
 > their app). By putting HA into the _app_, and not the _infrastructure_,
 > the dev teams get agility and scalability. No more waiting weeks for
 > allocationg specialized servers with hardware fencing setups and fibre
 > channel controllers to house a shared disk system so the super reliable
 > virtualization can hide HA from the user.
 >
 > Spin up vms. Spin up volumes.  Run some replication between regions,
 > and be resilient.

I don't argue that's the way to go. But reality is somewhat different.
In world of early design fail, low budget and deadlines some good
practices might be omitted early and might be hard to implement later.

Cloud from technical point of view can help to increase such apps, and
I think openstack should approach that part of market as well.

 > So, as long as it is understood that whatever is being proposed should
 > be an application centric feature, and not an infrastructure centric
 > feature, this argument remains interesting in the "cloud" context.
 > Otherwise, it is just an invitation for OpenStack to open up direct
 > competition with behemoths like vCenter.
 >
 > > Even the OpenStack controller services themselves do not gracefully
 > > handle failure.
 > >
 >
 > Which ones?

Heat has issues, horizon has issues, neutron l3 only works in 
active-passive setup.

 > > When these applications were virtualized, they were virtualized on
 > > platforms that enabled very high SLAs for each virtual machine,
 > > allowing the application to not be rewritten as the IT team moved them
 > > from physical to virtual. Now while these apps cannot benefit from
 > > methods like automatic scaleout, the application owners will greatly
 > > benefit from the self-service capabilities they will recieve as they
 > > utilize the OpenStack control plane.
 > >
 >
 > These apps were virtualized for IT's benefit. But the application authors
 > and users are now stuck in high-cost virtualization. The cloud is best
 > utilized when IT can control that cost and shift the burden of uptime
 > to the users by offering them more overall capacity and flexibility with
 > the caveat that the individual resources will not be as reliable.
 >
 > So what I'm most interested in is helping authors change their apps to
 > be reslient on their own, not in putting more burden on IT.

This can be very costly, therefore not always possible.

 > > I'd like to suggest to expand heat convergence mechanism to enable
 > > self-remediation of virtual machines and other heat resources.
 > >
 >
 > Convergence is still nascent. I don't know if I'd pile on to what might
 > take another 12 - 18 months to get done anyway. We're just now figuring
 > out how to get started where we thought we might already be 1/3 of the
 > way through. Just something to consider.

We don't need to complete convergence to start working with that. 
However this might take, sooner we start, sooner we deliver.


Thans,
Michał



More information about the OpenStack-dev mailing list