[Nova] Instances can't be started after compute nodes unexpectedly shut down because of power outage

Gökhan IŞIK skylightcoder at gmail.com
Fri Jul 12 07:46:31 UTC 2019


Awesome, thanks! Donny,
I followed below steps and rescue my instance.

   1.

    Find instance id and compute host

   root at infra1-utility-container-50bcf920:~# openstack server show
1d2e8a39-97ee-4ce7-a612-1b50f90cc51e -c id  -c
OS-EXT-SRV-ATTR:hypervisor_hostname
   +-------------------------------------+--------------------------------------+
   | Field                               | Value
         |
   +-------------------------------------+--------------------------------------+
   | OS-EXT-SRV-ATTR:hypervisor_hostname | compute06
         |
   | id                                  |
1d2e8a39-97ee-4ce7-a612-1b50f90cc51e |
   +-------------------------------------+--------------------------------------+


   2.

    Find image and backing image file on compute host

   root at compute06:~# qemu-img info -U  --backing-chain
/var/lib/nova/instances/1d2e8a39-97ee-4ce7-a612-1b50f90cc51e/disk
   image: /var/lib/nova/instances/1d2e8a39-97ee-4ce7-a612-1b50f90cc51e/disk
   file format: qcow2
   virtual size: 160G (171798691840 bytes)
   disk size: 32G
   cluster_size: 65536
   backing file:
/var/lib/nova/instances/_base/a1960f539532979a591c5f837ad604eedd9c7323
   Format specific information:
       compat: 1.1
       lazy refcounts: false
       refcount bits: 16
       corrupt: false
   image: /var/lib/nova/instances/_base/a1960f539532979a591c5f837ad604eedd9c7323
   file format: raw
   virtual size: 160G (171798691840 bytes)
   disk size: 18G



   3. Copy image and backing image file


   root at compute06:~# cp
/var/lib/nova/instances/1d2e8a39-97ee-4ce7-a612-1b50f90cc51e/disk
master
   root at compute06:~# cp
/var/lib/nova/instances/_base/a1960f539532979a591c5f837ad604eedd9c7323
new-master


   4.

    Rebase the image file that was backed off the original file so that it
   uses the new file i.e new-master then commit those changes back to
   original file master back into the new base new-master

   root at compute06:~# qemu-img rebase  -b new-master  -U master

   root at compute06:~# qemu-img commit master

   root at compute06:~# qemu-img info new-master




   5.

    Convert raw image to qcow2

   root at compute06:~# qemu-img convert -f raw -O qcow2 new-master
new-master.qcow2


   6.  Time to upload glance and then launch instance from this image :)


Thanks,
Gökhan.

Donny Davis <donny at fortnebula.com>, 12 Tem 2019 Cum, 00:56 tarihinde şunu
yazdı:

> Of course you can also always just pull the disk images from the vm
> folders, merge them back with the base file, upload to glance and then
> relaunch the instances.
>
> You can give this method a spin with the lowest risk to your instances
>
>
> https://medium.com/@kumar_pravin/qemu-merge-snapshot-and-backing-file-into-standalone-disk-c8d3a2b17c0e
>
>
>
>
>
> On Thu, Jul 11, 2019 at 4:10 PM Donny Davis <donny at fortnebula.com> wrote:
>
>> You surely want to leave locking turned on.
>>
>> You may want to ask qemu-devel about the locking of a image file and how
>> it works. This isn't really an Openstack issue, seems to be a layer below.
>>
>> Depending on how mission critical your VM's are, you could probably work
>> around it by just passing in  --force-share into the command openstack is
>> trying to run.
>>
>> I cannot recommend this path, the best way is to find out how you remove
>> the lock.
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 11, 2019 at 3:23 PM Gökhan IŞIK <skylightcoder at gmail.com>
>> wrote:
>>
>>> In [1] it says "Image locking is added and enabled by default. Multiple
>>> QEMU processes cannot write to the same image as long as the host supports
>>> OFD or posix locking, unless options are specified otherwise." May be need
>>> to do something on nova side.
>>>
>>> I run this command and get same error. Output is in
>>> http://paste.openstack.org/show/754311/
>>>
>>> İf I run qemu-img info instance-0000219b with -U , it doesn't give any
>>> errors.
>>>
>>> [1] https://wiki.qemu.org/ChangeLog/2.10
>>>
>>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 22:11 tarihinde
>>> şunu yazdı:
>>>
>>>> Well that is interesting. If you look in your libvirt config directory
>>>> (/etc/libvirt on Centos) you can get a little more info on what is being
>>>> used for locking.
>>>>
>>>> Maybe strace can shed some light on it. Try something like
>>>>
>>>> strace -ttt -f qemu-img info
>>>> /var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jul 11, 2019 at 2:39 PM Gökhan IŞIK <skylightcoder at gmail.com>
>>>> wrote:
>>>>
>>>>> I run virsh list --all command and output is below:
>>>>>
>>>>> root at compute06:~# virsh list --all
>>>>>  Id    Name                           State
>>>>> ----------------------------------------------------
>>>>>  -     instance-000012f9              shut off
>>>>>  -     instance-000013b6              shut off
>>>>>  -     instance-000016fb              shut off
>>>>>  -     instance-0000190a              shut off
>>>>>  -     instance-00001a8a              shut off
>>>>>  -     instance-00001e05              shut off
>>>>>  -     instance-0000202a              shut off
>>>>>  -     instance-00002135              shut off
>>>>>  -     instance-00002141              shut off
>>>>>  -     instance-000021b6              shut off
>>>>>  -     instance-000021ec              shut off
>>>>>  -     instance-000023db              shut off
>>>>>  -     instance-00002ad7              shut off
>>>>>
>>>>> And also when I try start instances with virsh , output is below:
>>>>>
>>>>> root at compute06:~# virsh start instance-0000219b
>>>>> error: Failed to start domain instance-000012f9
>>>>> error: internal error: process exited while connecting to monitor:
>>>>>  2019-07-11T18:36:34.229534Z qemu-system-x86_64: -chardev
>>>>> pty,id=charserial0,logfile=/dev/fdset/2,logappend=on: char device
>>>>> redirected to /dev/pts/3 (label charserial0)
>>>>> 2019-07-11T18:36:34.243395Z qemu-system-x86_64: -drive
>>>>> file=/var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,discard=ignore:
>>>>> Failed to get "write" lock
>>>>> Is another process using the image?
>>>>>
>>>>> Thanks,
>>>>> Gökhan
>>>>>
>>>>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 21:06 tarihinde
>>>>> şunu yazdı:
>>>>>
>>>>>> Can you ssh to the hypervisor and run virsh list to make sure your
>>>>>> instances are in fact down?
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 3:02 AM Gökhan IŞIK <skylightcoder at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Can anyone help me please ? I can no't rescue my instances yet :(
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gökhan
>>>>>>>
>>>>>>> Gökhan IŞIK <skylightcoder at gmail.com>, 9 Tem 2019 Sal, 15:46
>>>>>>> tarihinde şunu yazdı:
>>>>>>>
>>>>>>>> Hi folks,
>>>>>>>> Because of power outage, Most of our compute nodes  unexpectedly
>>>>>>>> shut  down and now I can not start our instances.  Error message is "Failed
>>>>>>>> to get "write" lock another process using the image?". Instances Power
>>>>>>>> status is No State.  Full error log is
>>>>>>>> http://paste.openstack.org/show/754107/. My environment is
>>>>>>>> OpenStack Pike on Ubuntu 16.04 LTS servers and Instances are on a nfs
>>>>>>>> shared storage. Nova version is 16.1.6.dev2. qemu version is 2.10.1.
>>>>>>>> libvirt version is 3.6.0. I saw a commit [1], but it doesn't solve this
>>>>>>>> problem.
>>>>>>>> There are important instances on my environment. How can I rescue
>>>>>>>> my instances? What would you suggest ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gökhan
>>>>>>>>
>>>>>>>> [1] https://review.opendev.org/#/c/509774/
>>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190712/538cc7ab/attachment-0001.html>


More information about the openstack-discuss mailing list