<font size=2 face="sans-serif">I think this is a good idea.</font>

<br><font size=2 face="sans-serif">We already have a framework in Nova

to detect and report the failure (service group monitoring APIs, with DB

and ZK backends already implemented), as well as APIs to list instances

on a host and to evacuate individual instances (soon with destination selected

by the scheduler). Indeed, the missing pieces now are the end-to-end orchestration

(which is probably not going to happen within Nova, at least at the moment),

and the mechanism(s) to isolate the failed host (e.g., to protect against

false failure detection events) -- which could potentially happen in several

places, as you mentioned. It might be the case that whatever can be done

within Nova is already there -- the corresponding nova-compute will be

considered down. So, maybe now the question is which additional components

might be used (as you mentioned -- bare-metal, quantum, cinder, etc). Once

the individual measures are clear (and implemented), the orchestration

logic (wherever that would be) can use them.</font>

<br>

<br><font size=2 face="sans-serif">Regards,</font>

<br><font size=2 face="sans-serif">Alex</font>

<br>

<br>

<br>

<br>

<br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Leen Besselink <ubuntu@consolejunkie.net></font>

<br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">OpenStack Development

Mailing List <openstack-dev@lists.openstack.org>, </font>

<br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">22/04/2013 03:18 PM</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">[openstack-dev]

blueprint proposal nova-compute fencing for HA ?</font>

<br>

<hr noshade>

<br>

<br>

<br><tt><font size=2>Hi,<br>

<br>

As I have not been at the summit and the technical videos of the Summit

are not yet online I am not aware of what was discusses there.<br>

<br>

But I would like to submit a blueprint.<br>

<br>

My idea is:<br>

<br>

It is a step to support VM High availability.<br>

<br>

This part is about handling compute node failure.<br>

<br>

My proposal would be to create a framework/API/plugin/agent or whatever

is needed for fencing off a nova-compute node.<br>

<br>

So when something detects that a nova-compute node isn't functional anymore

it can fence off that nova-compute node.<br>

<br>

After which it can call 'evacuate' to start the instance(s) that were previously

running on the failed compute node on other compute node(s).<br>

<br>

The implementation of the code that handles the fencing could be implemented

in different ways for different environments:<br>

<br>

- The IPMI-code that handle baremetal provisining could for example be

used to poweroff or reboot the node.<br>

<br>

- The Quantum networking code could be used to "disconnect" the

instance(s) of the failed compute node (or the whole compute node) from

their respective networks. If you are using overlays you could configure

other machines not to accept tunnel traffic from the failed compute node

for the networks of the instance(s)<br>

<br>

- You could also have a firewall agent configure the shared storage servers

(or a firewall in between) to not accept traffic from the failed compute

node<br>

<br>

I am sure other people have other ideas.<br>

<br>

My request would be to have an API and general framework which can call

the different implementations that are configured for that environment.<br>

<br>

Does that make any sense ?<br>

<br>

Or maybe should this be handled by creating clusters with for example pacemaker

like I assume oVirt might be doing with their proposals:<br>

<br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/rhev-m-ovirt-clusters-as-compute-resources/"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/rhev-m-ovirt-clusters-as-compute-resources/</font></tt></a><tt><font size=2><br>

<br>

As I am not yet all that familar with the structure of OpenStack or how

it is organized it could be I am asking in the wrong place to discuss this

or if it architecturally does not fit in then do let me know where I went

wrong.<br>

<br>

I've looked at the list of existing blueprints and I at least see other

evacuate, fault-tolerance/HA- and other related blueprints as well:<br>

<br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/evacuate-host"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/evacuate-host</font></tt></a><tt><font size=2><br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance</font></tt></a><tt><font size=2><br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/unify-migrate-and-live-migrate"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/unify-migrate-and-live-migrate</font></tt></a><tt><font size=2><br>

</font></tt><a href=https://etherpad.openstack.org/HavanaUnifyMigrateAndLiveMigrate><tt><font size=2>https://etherpad.openstack.org/HavanaUnifyMigrateAndLiveMigrate</font></tt></a><tt><font size=2><br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling</font></tt></a><tt><font size=2><br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/bare-metal-fault-tolerance"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/bare-metal-fault-tolerance</font></tt></a><tt><font size=2><br>

</font></tt><a href=http://openstacksummitapril2013.sched.org/event/92e3468e458c13616331e75f15685560#.UXUeVXyuiw4><tt><font size=2>http://openstacksummitapril2013.sched.org/event/92e3468e458c13616331e75f15685560#.UXUeVXyuiw4</font></tt></a><tt><font size=2><br>

</font></tt><a href="https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling"><tt><font size=2>https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling</font></tt></a><tt><font size=2><br>

<br>

I think it would be a good idea to have an idea of what all of the usecases

are and then split them up in tasks.<br>

<br>

Hope this is helpful.<br>

<br>

Have a nice day,<br>

                

Leen.<br>

<br>

_______________________________________________<br>

OpenStack-dev mailing list<br>

OpenStack-dev@lists.openstack.org<br>

</font></tt><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"><tt><font size=2>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</font></tt></a><tt><font size=2><br>

<br>

</font></tt>

<br>