[Nova] Instances can't be started after compute nodes unexpectedly shut down because of power outage

Donny Davis donny at fortnebula.com
Fri Jul 12 19:37:05 UTC 2019


How is the recovery coming along Gökhan?  I am curious to hear.

On Fri, Jul 12, 2019 at 3:46 AM Gökhan IŞIK <skylightcoder at gmail.com> wrote:

> Awesome, thanks! Donny,
> I followed below steps and rescue my instance.
>
>    1.
>
>     Find instance id and compute host
>
>    root at infra1-utility-container-50bcf920:~# openstack server show 1d2e8a39-97ee-4ce7-a612-1b50f90cc51e -c id  -c OS-EXT-SRV-ATTR:hypervisor_hostname
>    +-------------------------------------+--------------------------------------+
>    | Field                               | Value                                |
>    +-------------------------------------+--------------------------------------+
>    | OS-EXT-SRV-ATTR:hypervisor_hostname | compute06                            |
>    | id                                  | 1d2e8a39-97ee-4ce7-a612-1b50f90cc51e |
>    +-------------------------------------+--------------------------------------+
>
>
>    2.
>
>     Find image and backing image file on compute host
>
>    root at compute06:~# qemu-img info -U  --backing-chain /var/lib/nova/instances/1d2e8a39-97ee-4ce7-a612-1b50f90cc51e/disk
>    image: /var/lib/nova/instances/1d2e8a39-97ee-4ce7-a612-1b50f90cc51e/disk
>    file format: qcow2
>    virtual size: 160G (171798691840 bytes)
>    disk size: 32G
>    cluster_size: 65536
>    backing file: /var/lib/nova/instances/_base/a1960f539532979a591c5f837ad604eedd9c7323
>    Format specific information:
>        compat: 1.1
>        lazy refcounts: false
>        refcount bits: 16
>        corrupt: false
>    image: /var/lib/nova/instances/_base/a1960f539532979a591c5f837ad604eedd9c7323
>    file format: raw
>    virtual size: 160G (171798691840 bytes)
>    disk size: 18G
>
>
>
>    3. Copy image and backing image file
>
>
>    root at compute06:~# cp  /var/lib/nova/instances/1d2e8a39-97ee-4ce7-a612-1b50f90cc51e/disk master
>    root at compute06:~# cp /var/lib/nova/instances/_base/a1960f539532979a591c5f837ad604eedd9c7323 new-master
>
>
>    4.
>
>     Rebase the image file that was backed off the original file so that
>    it uses the new file i.e new-master then commit those changes back to
>    original file master back into the new base new-master
>
>    root at compute06:~# qemu-img rebase  -b new-master  -U master
>
>    root at compute06:~# qemu-img commit master
>
>    root at compute06:~# qemu-img info new-master
>
>
>
>
>    5.
>
>     Convert raw image to qcow2
>
>    root at compute06:~# qemu-img convert -f raw -O qcow2 new-master new-master.qcow2
>
>
>    6.  Time to upload glance and then launch instance from this image :)
>
>
> Thanks,
> Gökhan.
>
> Donny Davis <donny at fortnebula.com>, 12 Tem 2019 Cum, 00:56 tarihinde şunu
> yazdı:
>
>> Of course you can also always just pull the disk images from the vm
>> folders, merge them back with the base file, upload to glance and then
>> relaunch the instances.
>>
>> You can give this method a spin with the lowest risk to your instances
>>
>>
>> https://medium.com/@kumar_pravin/qemu-merge-snapshot-and-backing-file-into-standalone-disk-c8d3a2b17c0e
>>
>>
>>
>>
>>
>> On Thu, Jul 11, 2019 at 4:10 PM Donny Davis <donny at fortnebula.com> wrote:
>>
>>> You surely want to leave locking turned on.
>>>
>>> You may want to ask qemu-devel about the locking of a image file and how
>>> it works. This isn't really an Openstack issue, seems to be a layer below.
>>>
>>> Depending on how mission critical your VM's are, you could probably work
>>> around it by just passing in  --force-share into the command openstack is
>>> trying to run.
>>>
>>> I cannot recommend this path, the best way is to find out how you remove
>>> the lock.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 11, 2019 at 3:23 PM Gökhan IŞIK <skylightcoder at gmail.com>
>>> wrote:
>>>
>>>> In [1] it says "Image locking is added and enabled by default.
>>>> Multiple QEMU processes cannot write to the same image as long as the host
>>>> supports OFD or posix locking, unless options are specified otherwise." May
>>>> be need to do something on nova side.
>>>>
>>>> I run this command and get same error. Output is in
>>>> http://paste.openstack.org/show/754311/
>>>>
>>>> İf I run qemu-img info instance-0000219b with -U , it doesn't give any
>>>> errors.
>>>>
>>>> [1] https://wiki.qemu.org/ChangeLog/2.10
>>>>
>>>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 22:11 tarihinde
>>>> şunu yazdı:
>>>>
>>>>> Well that is interesting. If you look in your libvirt config directory
>>>>> (/etc/libvirt on Centos) you can get a little more info on what is being
>>>>> used for locking.
>>>>>
>>>>> Maybe strace can shed some light on it. Try something like
>>>>>
>>>>> strace -ttt -f qemu-img info
>>>>> /var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 11, 2019 at 2:39 PM Gökhan IŞIK <skylightcoder at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I run virsh list --all command and output is below:
>>>>>>
>>>>>> root at compute06:~# virsh list --all
>>>>>>  Id    Name                           State
>>>>>> ----------------------------------------------------
>>>>>>  -     instance-000012f9              shut off
>>>>>>  -     instance-000013b6              shut off
>>>>>>  -     instance-000016fb              shut off
>>>>>>  -     instance-0000190a              shut off
>>>>>>  -     instance-00001a8a              shut off
>>>>>>  -     instance-00001e05              shut off
>>>>>>  -     instance-0000202a              shut off
>>>>>>  -     instance-00002135              shut off
>>>>>>  -     instance-00002141              shut off
>>>>>>  -     instance-000021b6              shut off
>>>>>>  -     instance-000021ec              shut off
>>>>>>  -     instance-000023db              shut off
>>>>>>  -     instance-00002ad7              shut off
>>>>>>
>>>>>> And also when I try start instances with virsh , output is below:
>>>>>>
>>>>>> root at compute06:~# virsh start instance-0000219b
>>>>>> error: Failed to start domain instance-000012f9
>>>>>> error: internal error: process exited while connecting to monitor:
>>>>>>  2019-07-11T18:36:34.229534Z qemu-system-x86_64: -chardev
>>>>>> pty,id=charserial0,logfile=/dev/fdset/2,logappend=on: char device
>>>>>> redirected to /dev/pts/3 (label charserial0)
>>>>>> 2019-07-11T18:36:34.243395Z qemu-system-x86_64: -drive
>>>>>> file=/var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,discard=ignore:
>>>>>> Failed to get "write" lock
>>>>>> Is another process using the image?
>>>>>>
>>>>>> Thanks,
>>>>>> Gökhan
>>>>>>
>>>>>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 21:06 tarihinde
>>>>>> şunu yazdı:
>>>>>>
>>>>>>> Can you ssh to the hypervisor and run virsh list to make sure your
>>>>>>> instances are in fact down?
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 3:02 AM Gökhan IŞIK <skylightcoder at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Can anyone help me please ? I can no't rescue my instances yet :(
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gökhan
>>>>>>>>
>>>>>>>> Gökhan IŞIK <skylightcoder at gmail.com>, 9 Tem 2019 Sal, 15:46
>>>>>>>> tarihinde şunu yazdı:
>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>> Because of power outage, Most of our compute nodes  unexpectedly
>>>>>>>>> shut  down and now I can not start our instances.  Error message is "Failed
>>>>>>>>> to get "write" lock another process using the image?". Instances Power
>>>>>>>>> status is No State.  Full error log is
>>>>>>>>> http://paste.openstack.org/show/754107/. My environment is
>>>>>>>>> OpenStack Pike on Ubuntu 16.04 LTS servers and Instances are on a nfs
>>>>>>>>> shared storage. Nova version is 16.1.6.dev2. qemu version is 2.10.1.
>>>>>>>>> libvirt version is 3.6.0. I saw a commit [1], but it doesn't solve this
>>>>>>>>> problem.
>>>>>>>>> There are important instances on my environment. How can I rescue
>>>>>>>>> my instances? What would you suggest ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gökhan
>>>>>>>>>
>>>>>>>>> [1] https://review.opendev.org/#/c/509774/
>>>>>>>>>
>>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190712/ec625a2d/attachment-0001.html>


More information about the openstack-discuss mailing list