[Openstack] Compute Node Down!

Alex Glikson GLIKSON at il.ibm.com
Sun Sep 23 08:22:45 UTC 2012


As stated below, the following patch addresses the VM recovery in this and 
few other scenarios: https://review.openstack.org/#/c/11086/
Also, there is another patch which can be used to simplify failure 
detection: https://review.openstack.org/#/c/10903/
Unfortunately, neither of the two made it for Folsom, and are planned to 
be merged once a stream targeting Grizzly is available.
Glad to here that there is a relatively simple workaround meanwhile. 
Would be good to have a broader discussion on this at the summit.

Regards,
Alex






From:   Wolfgang Hennerbichler <wolfgang.hennerbichler at risc-software.at>
To:     Tom Fifield <fifieldt at unimelb.edu.au>, 
Cc:     "openstack at lists.launchpad.net" <openstack at lists.launchpad.net>
Date:   20/09/2012 09:19 AM
Subject:        Re: [Openstack] Compute Node Down!
Sent by:        openstack-bounces+glikson=il.ibm.com at lists.launchpad.net



Thanks, that's what I ended up doing (by intuition rather than knowledge) 
yesterday. I didn't know about nova rescue either. 
I think this is a Big Big room for improvement here. In the best case this 
should be discovered automatically and the switchover should be done 
without human Intervention. 

Wolfgang 

-- 
Sent from my mobile device

On 20.09.2012, at 07:26, "Tom Fifield" <fifieldt at unimelb.edu.au> wrote:

> On 20/09/12 13:50, Vishvananda Ishaya wrote:
>> **
>> 
>> On Wed, Sep 19, 2012 at 4:03 AM, Wolfgang Hennerbichler
>> <wolfgang.hennerbichler at risc-software.at
>> <mailto:wolfgang.hennerbichler at risc-software.at>> wrote:
>> 
>>    Hello Folks,
>> 
>>    Although it seems a pretty straightforward scenario I have a hard
>>    time finding documentation on this.
>>    One of my compute nodes broke down. All the instances are on shared
>>    storage, so no troubles here, but I don't know how to tell openstack
>>    that the VM should be deployed on another compute node. I tried
>>    fiddling around in the mysql-db with no success.
>>    Any help is really appreciated.
>> 
>>    Wolfgang
> 
> 
> 
> 
> == Dead compute host ==
> Working with the host information
> <pre>
> i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 
2012-06-19T00:48:11.000Z 115.146.93.60
> </pre>
> 
> # review the status of the host using the nova database, some of the 
important information is highlighted below.
> <pre>
> SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G;
> *************************** 1. row ***************************
>              created_at: 2012-06-19 00:48:11
>              updated_at: 2012-07-03 00:35:11
>              deleted_at: NULL
> ...
>                      id: 5561
> ...
>             power_state: 5
>                vm_state: shutoff
> ...
>                hostname: at3-ui02
>                    host: np-rcc54
> ...
>                    uuid: 3f57699a-e773-4650-a443-b4b37eed5a06
> ...
>              task_state: NULL
> ...
> </pre>
> 
> Update the vm's compute host.
> <pre>
> UPDATE instances SET host = 'np-rcc46' WHERE uuid = 
'3f57699a-e773-4650-a443-b4b37eed5a06';
> </pre>
> 
> Update the libvirt xml
> 
> * change the DHCPSERVER value to the host ip address.
> * possibly the VNC IP if it isn't already 0.0.0.0
> 
> Dump a copy of a nwfilter to use as a template for creating the missing 
nwfilter.
> <pre>
> virsh nwfilter-list
> vrish nwfilter-dumpxml nova-instance-instance-.....
> </pre>
> 
> Example of the template file
> <pre>
> <filter name='nova-instance-instance-00001cc6-fa163e003b43' 
chain='root'>
>  <uuid>d5f6f610-d0b8-4407-ae00-5dabef80677a</uuid>
>  <filterref filter='nova-base'/>
> </filter>
> 
> </pre>
> The filter name value is available from the instances.xml file 
(<filterref filter="nova-instance-instance-00001cc6-fa163e003b43">).
> *Note the filter name must be exact!
> Generate a new uuid and replace it at the uuid value.
> 
> Update filter to match id from instance xml
> <pre>
> virsh nwfilter-define /tmp/filter.xml
> virsh define libvirt.xml
> virsh list --all
> </pre>
> 
> Kill all dnsmasq and restart nova services.
> <pre>
> killall dnsmasq; service nova-network restart; service nova-compute 
restart
> </pre>
> 
> Start the vm
> <pre>
> virsh start instance-00000
> </pre>
> 
> On the nova DB
> <pre>
> UPDATE instances SET vm_state = 'active', power_state = 1 WHERE uuid = 
'3f57699a-e773-4650-a443-b4b37eed5a06';
> </pre>
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack at lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120923/edd4c0e2/attachment.html>


More information about the Openstack mailing list