<font size=3 face="Times New Roman"><i>> Hypervisor failure detection

is also more or less solved problem in Nova [2]. There are other candidates

for that task as well, like Ceilometer's hardware agent [3] (still WIP

to my knowledge).</i></font>

<br>

<br><font size=3 face="Times New Roman">The problem is that in some cases

you want to be *really* sure that the hypervisor is down before running

'evacuate' (otherwise it could lead to an application crash). And you want

to do it on scale. So, polling and traditional monitoring might not be

good enough for a fully-automated service (e.g., you may need to do 'fencing'

to ensure that the node will not suddenly come back with all the VMs still

running).</font>

<br>

<br><font size=3 face="Times New Roman">Regards,</font>

<br><font size=3 face="Times New Roman">Alex</font>

<br>

<br>

<br>

<br>

<br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Oleg Gelbukh <ogelbukh@mirantis.com></font>

<br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">OpenStack Development

Mailing List <openstack-dev@lists.openstack.org>, </font>

<br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">09/10/2013 02:09 PM</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [openstack-dev]

[nova] automatically evacuate instances on compute failure</font>

<br>

<hr noshade>

<br>

<br>

<br><font size=3>Hello,</font>

<br>

<br><font size=3>We have much interest in this discussion (with focus on

second scenario outlined by Tim), and working on its design at the moment.

Thanks to everyone for valuable insights in this thread.</font>

<br>

<br><font size=3>It looks like external orchestration daemon problem is

partially solved already by Heat with HARestarter resource [1].</font>

<br>

<br><font size=3>Hypervisor failure detection is also more or less solved

problem in Nova [2]. There are other candidates for that task as well,

like Ceilometer's hardware agent [3] (still WIP to my knowledge).</font>

<br>

<br><font size=3>[1] </font><a href=https://github.com/openstack/heat/blob/stable/grizzly/heat/engine/resources/instance.py#L35 target=_blank><font size=3 color=blue><u>https://github.com/openstack/heat/blob/stable/grizzly/heat/engine/resources/instance.py#L35</u></font></a>

<br><font size=3>[2] </font><a href="http://docs.openstack.org/developer/nova/api/nova.api.openstack.compute.contrib.hypervisors.html#module-nova.api.openstack.compute.contrib.hypervisors" target=_blank><font size=3 color=blue><u>http://docs.openstack.org/developer/nova/api/nova.api.openstack.compute.contrib.hypervisors.html#module-nova.api.openstack.compute.contrib.hypervisors</u></font></a>

<br><font size=3>[3] </font><a href="https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices" target=_blank><font size=3 color=blue><u>https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices</u></font></a>

<br><font size=3>--</font>

<br><font size=3>Best regards,</font>

<br><font size=3>Oleg Gelbukh</font>

<br><font size=3>Mirantis Labs</font>

<br><font size=3><br>

</font>

<br><font size=3>On Wed, Oct 9, 2013 at 9:26 AM, Tim Bell <</font><a href=mailto:Tim.Bell@cern.ch target=_blank><font size=3 color=blue><u>Tim.Bell@cern.ch</u></font></a><font size=3>>

wrote:</font>

<br><font size=3>I have proposed the summit design session for Hong Kong

(</font><a href=http://summit.openstack.org/cfp/details/103 target=_blank><font size=3 color=blue><u>http://summit.openstack.org/cfp/details/103</u></font></a><font size=3>)

to discuss exactly these sort of points. We have the low level Nova commands

but need a service to automate the process.<br>

<br>

I see two scenarios<br>

<br>

- A hardware intervention needs to be scheduled, please rebalance this

workload elsewhere before it fails completely<br>

- A hypervisor has failed, please recover what you can using shared storage

and give me a policy on what to do with the other VMs (restart, leave down

till repair etc.)<br>

<br>

Most OpenStack production sites have some sort of script doing this sort

of thing now. However, each one will be implementing the logic for migration

differently so there is no agreed best practise approach.</font><font size=3 color=#8f8f8f><br>

<br>

Tim</font>

<br><font size=3><br>

> -----Original Message-----<br>

> From: Chris Friesen [mailto:</font><a href=mailto:chris.friesen@windriver.com target=_blank><font size=3 color=blue><u>chris.friesen@windriver.com</u></font></a><font size=3>]<br>

> Sent: 09 October 2013 00:48<br>

> To: </font><a href="mailto:openstack-dev@lists.openstack.org" target=_blank><font size=3 color=blue><u>openstack-dev@lists.openstack.org</u></font></a><font size=3><br>

> Subject: Re: [openstack-dev] [nova] automatically evacuate instances

on compute failure<br>

></font>

<br><font size=3>> On 10/08/2013 03:20 PM, Alex Glikson wrote:<br>

> > Seems that this can be broken into 3 incremental pieces. First,

would<br>

> > be great if the ability to schedule a single 'evacuate' would

be<br>

> > finally merged<br>

> > (_</font><a href="https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance_" target=_blank><font size=3 color=blue><u>https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance_</u></font></a><font size=3>).<br>

><br>

> Agreed.<br>

><br>

> > Then, it would make sense to have the logic that evacuates an

entire<br>

> > host<br>

> > (_</font><a href="https://blueprints.launchpad.net/python-novaclient/+spec/find-and-evacuate-host_" target=_blank><font size=3 color=blue><u>https://blueprints.launchpad.net/python-novaclient/+spec/find-and-evacuate-host_</u></font></a><font size=3>).<br>

> > The reasoning behind suggesting that this should not necessarily

be in<br>

> > Nova is, perhaps, that it *can* be implemented outside Nova using

the<br>

> > indvidual 'evacuate' API.<br>

><br>

> This actually more-or-less exists already in the existing "nova

host-evacuate" command.  One major issue with this however is

that it<br>

> requires the caller to specify whether all the instances are on shared

or local storage, and so it can't handle a mix of local and shared<br>

> storage for the instances.   If any of them boot off block storage

for<br>

> instance you need to move them first and then do the remaining ones

as a group.<br>

><br>

> It would be nice to embed the knowledge of whether or not an instance

is on shared storage in the instance itself at creation time.  I<br>

> envision specifying this in the config file for the compute manager

along with the instance storage location, and the compute manager<br>

> could set the field in the instance at creation time.<br>

><br>

> > Finally, it should be possible to close the loop and invoke the<br>

> > evacuation automatically as a result of a failure detection (not

clear<br>

> > how exactly this would work, though). Hopefully we will have

at least<br>

> > the first part merged soon (not sure if anyone is actively working

on<br>

> > a rebase).<br>

><br>

> My interpretation of the discussion so far is that the nova maintainers

would prefer this to be driven by an outside orchestration daemon.<br>

><br>

> Currently the only way a service is recognized to be "down"

is if someone calls is_up() and it notices that the service hasn't sent

an update<br>

> in the last minute.  There's nothing in nova actively scanning

for compute node failures, which is where the outside daemon comes in.<br>

><br>

> Also, there is some complexity involved in dealing with auto-evacuate:<br>

> What do you do if an evacuate fails?  How do you recover intelligently

if there is no admin involved?<br>

><br>

> Chris<br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> </font><a href="mailto:OpenStack-dev@lists.openstack.org" target=_blank><font size=3 color=blue><u>OpenStack-dev@lists.openstack.org</u></font></a><font size=3><br>

> </font><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target=_blank><font size=3 color=blue><u>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</u></font></a><font size=3><br>

<br>

_______________________________________________<br>

OpenStack-dev mailing list</font><font size=3 color=blue><u><br>

</u></font><a href="mailto:OpenStack-dev@lists.openstack.org" target=_blank><font size=3 color=blue><u>OpenStack-dev@lists.openstack.org</u></font></a><font size=3 color=blue><u><br>

</u></font><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target=_blank><font size=3 color=blue><u>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</u></font></a>

<br><tt><font size=2>_______________________________________________<br>

OpenStack-dev mailing list<br>

OpenStack-dev@lists.openstack.org<br>

</font></tt><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"><tt><font size=2>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</font></tt></a><tt><font size=2><br>

</font></tt>

<br>