[openstack-dev] blueprint proposal nova-compute fencing for HA ?

Leen Besselink ubuntu at consolejunkie.net
Mon Apr 22 12:09:07 UTC 2013


Hi,

As I have not been at the summit and the technical videos of the Summit are not yet online I am not aware of what was discusses there.

But I would like to submit a blueprint.

My idea is:

It is a step to support VM High availability.

This part is about handling compute node failure.

My proposal would be to create a framework/API/plugin/agent or whatever is needed for fencing off a nova-compute node.

So when something detects that a nova-compute node isn't functional anymore it can fence off that nova-compute node.

After which it can call 'evacuate' to start the instance(s) that were previously running on the failed compute node on other compute node(s).

The implementation of the code that handles the fencing could be implemented in different ways for different environments:

- The IPMI-code that handle baremetal provisining could for example be used to poweroff or reboot the node.

- The Quantum networking code could be used to "disconnect" the instance(s) of the failed compute node (or the whole compute node) from their respective networks. If you are using overlays you could configure other machines not to accept tunnel traffic from the failed compute node for the networks of the instance(s)

- You could also have a firewall agent configure the shared storage servers (or a firewall in between) to not accept traffic from the failed compute node

I am sure other people have other ideas.

My request would be to have an API and general framework which can call the different implementations that are configured for that environment.

Does that make any sense ?

Or maybe should this be handled by creating clusters with for example pacemaker like I assume oVirt might be doing with their proposals:

https://blueprints.launchpad.net/nova/+spec/rhev-m-ovirt-clusters-as-compute-resources/

As I am not yet all that familar with the structure of OpenStack or how it is organized it could be I am asking in the wrong place to discuss this or if it architecturally does not fit in then do let me know where I went wrong.

I've looked at the list of existing blueprints and I at least see other evacuate, fault-tolerance/HA- and other related blueprints as well:

https://blueprints.launchpad.net/nova/+spec/evacuate-host
https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance
https://blueprints.launchpad.net/nova/+spec/unify-migrate-and-live-migrate
https://etherpad.openstack.org/HavanaUnifyMigrateAndLiveMigrate
https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling
https://blueprints.launchpad.net/nova/+spec/bare-metal-fault-tolerance
http://openstacksummitapril2013.sched.org/event/92e3468e458c13616331e75f15685560#.UXUeVXyuiw4
https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling

I think it would be a good idea to have an idea of what all of the usecases are and then split them up in tasks.

Hope this is helpful.

Have a nice day,
	Leen.



More information about the OpenStack-dev mailing list