[openstack-dev] [Nova] Automatic evacuate

Jastrzebski, Michal michal.jastrzebski at intel.com
Thu Oct 16 07:25:05 UTC 2014



> -----Original Message-----
> From: Russell Bryant [mailto:rbryant at redhat.com]
> Sent: Thursday, October 16, 2014 5:04 AM
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [Nova] Automatic evacuate
> 
> On 10/15/2014 05:07 PM, Florian Haas wrote:
> > On Wed, Oct 15, 2014 at 10:03 PM, Russell Bryant <rbryant at redhat.com>
> wrote:
> >>> Am I making sense?
> >>
> >> Yep, the downside is just that you need to provide a new set of
> >> flavors for "ha" vs "non-ha".  A benefit though is that it's a way to
> >> support it today without *any* changes to OpenStack.
> >
> > Users are already very used to defining new flavors. Nova itself
> > wouldn't even need to define those; if the vendor's deployment tools
> > defined them it would be just fine.
> 
> Yes, I know Nova wouldn't need to define it.  I was saying I didn't like that it
> was required at all.
> 
> >> This seems like the kind of thing we should also figure out how to
> >> offer on a per-guest basis without needing a new set of flavors.
> >> That's why I also listed the server tagging functionality as another possible
> solution.
> >
> > This still doesn't do away with the requirement to reliably detect
> > node failure, and to fence misbehaving nodes. Detecting that a node
> > has failed, and fencing it if unsure, is a prerequisite for any
> > recovery action. So you need Corosync/Pacemaker anyway.
> 
> Obviously, yes.  My post covered all of that directly ... the tagging bit was just
> additional input into the recovery operation.
> 
> > Note also that when using an approach where you have physically
> > clustered nodes, but you are also running non-HA VMs on those, then
> > the user must understand that the following applies:
> >
> > (1) If your guest is marked HA, then it will automatically recover on
> > node failure, but
> > (2) if your guest is *not* marked HA, then it will go down with the
> > node not only if it fails, but also if it is fenced.
> >
> > So a non-HA guest on an HA node group actually has a slightly
> > *greater* chance of going down than a non-HA guest on a non-HA host.
> > (And let's not get into "don't use fencing then"; we all know why
> > that's a bad idea.)
> >
> > Which is why I think it makes sense to just distinguish between
> > HA-capable and non-HA-capable hosts, and have the user decide whether
> > they want HA or non-HA guests simply by assigning them to the
> > appropriate host aggregates.
> 
> Very good point.  I hadn't considered that.
> 
> --
> Russell Bryant
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

In my opinion flavor defining is a bit hacky. Sure, it will provide us
functionality fairly quickly, but also will strip us from flexibility Heat
would give. Healing can be done in several ways, simple destroy -> create
(basic convergence workflow so far), evacuate with or without
shared storage, even rebuild vm, probably few more when we put more thoughts
to it.

I'd rather use nova for low level task and maybe low level monitoring (imho
nova should do that using servicegroup). But I'd use something more more
configurable for actual task triggering like heat. That would give us
framework rather than mechanism. Later we might want to apply HA on network or
volume, then we'll have mechanism ready just monitoring hook and healing
will need to be implemented.
We can use scheduler hints to place resource on host HA-compatible 
(whichever health action we'd like to use), this will bit more complicated, but
also will give us more flexibility.

I agree that we all should meet in Paris and discuss that so we can join our
forces. This is one of bigger gaps to be filled imho.




More information about the OpenStack-dev mailing list