[openstack-dev] blueprint proposal nova-compute fencing for HA ?

Sylvain Bauza sylvain.bauza at digimind.com
Mon Apr 22 13:56:54 UTC 2013


I'm not fully aware of the Heat/Ceilometer parts, but I know there are 
some attempts to mix them up in order to get some HA capabilities based 
on metrics given by Ceilometer.

-Sylvain


Le 22/04/2013 14:58, Alex Glikson a écrit :
> I think this is a good idea.
> We already have a framework in Nova to detect and report the failure 
> (service group monitoring APIs, with DB and ZK backends already 
> implemented), as well as APIs to list instances on a host and to 
> evacuate individual instances (soon with destination selected by the 
> scheduler). Indeed, the missing pieces now are the end-to-end 
> orchestration (which is probably not going to happen within Nova, at 
> least at the moment), and the mechanism(s) to isolate the failed host 
> (e.g., to protect against false failure detection events) -- which 
> could potentially happen in several places, as you mentioned. It might 
> be the case that whatever can be done within Nova is already there -- 
> the corresponding nova-compute will be considered down. So, maybe now 
> the question is which additional components might be used (as you 
> mentioned -- bare-metal, quantum, cinder, etc). Once the individual 
> measures are clear (and implemented), the orchestration logic 
> (wherever that would be) can use them.
>
> Regards,
> Alex
>
>
>
>
> From: Leen Besselink <ubuntu at consolejunkie.net>
> To: OpenStack Development Mailing List 
> <openstack-dev at lists.openstack.org>,
> Date: 22/04/2013 03:18 PM
> Subject: [openstack-dev] blueprint proposal nova-compute fencing for HA ?
> ------------------------------------------------------------------------
>
>
>
> Hi,
>
> As I have not been at the summit and the technical videos of the 
> Summit are not yet online I am not aware of what was discusses there.
>
> But I would like to submit a blueprint.
>
> My idea is:
>
> It is a step to support VM High availability.
>
> This part is about handling compute node failure.
>
> My proposal would be to create a framework/API/plugin/agent or 
> whatever is needed for fencing off a nova-compute node.
>
> So when something detects that a nova-compute node isn't functional 
> anymore it can fence off that nova-compute node.
>
> After which it can call 'evacuate' to start the instance(s) that were 
> previously running on the failed compute node on other compute node(s).
>
> The implementation of the code that handles the fencing could be 
> implemented in different ways for different environments:
>
> - The IPMI-code that handle baremetal provisining could for example be 
> used to poweroff or reboot the node.
>
> - The Quantum networking code could be used to "disconnect" the 
> instance(s) of the failed compute node (or the whole compute node) 
> from their respective networks. If you are using overlays you could 
> configure other machines not to accept tunnel traffic from the failed 
> compute node for the networks of the instance(s)
>
> - You could also have a firewall agent configure the shared storage 
> servers (or a firewall in between) to not accept traffic from the 
> failed compute node
>
> I am sure other people have other ideas.
>
> My request would be to have an API and general framework which can 
> call the different implementations that are configured for that 
> environment.
>
> Does that make any sense ?
>
> Or maybe should this be handled by creating clusters with for example 
> pacemaker like I assume oVirt might be doing with their proposals:
>
> https://blueprints.launchpad.net/nova/+spec/rhev-m-ovirt-clusters-as-compute-resources/
>
> As I am not yet all that familar with the structure of OpenStack or 
> how it is organized it could be I am asking in the wrong place to 
> discuss this or if it architecturally does not fit in then do let me 
> know where I went wrong.
>
> I've looked at the list of existing blueprints and I at least see 
> other evacuate, fault-tolerance/HA- and other related blueprints as well:
>
> https://blueprints.launchpad.net/nova/+spec/evacuate-host
> https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance
> https://blueprints.launchpad.net/nova/+spec/unify-migrate-and-live-migrate
> https://etherpad.openstack.org/HavanaUnifyMigrateAndLiveMigrate
> https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling
> https://blueprints.launchpad.net/nova/+spec/bare-metal-fault-tolerance
> http://openstacksummitapril2013.sched.org/event/92e3468e458c13616331e75f15685560#.UXUeVXyuiw4
> https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling
>
> I think it would be a good idea to have an idea of what all of the 
> usecases are and then split them up in tasks.
>
> Hope this is helpful.
>
> Have a nice day,
> Leen.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130422/4937c19d/attachment.html>


More information about the OpenStack-dev mailing list