[Nova] Instances can't be started after compute nodes unexpectedly shut down because of power outage

Donny Davis donny at fortnebula.com
Thu Jul 11 21:56:00 UTC 2019


Of course you can also always just pull the disk images from the vm
folders, merge them back with the base file, upload to glance and then
relaunch the instances.

You can give this method a spin with the lowest risk to your instances

https://medium.com/@kumar_pravin/qemu-merge-snapshot-and-backing-file-into-standalone-disk-c8d3a2b17c0e





On Thu, Jul 11, 2019 at 4:10 PM Donny Davis <donny at fortnebula.com> wrote:

> You surely want to leave locking turned on.
>
> You may want to ask qemu-devel about the locking of a image file and how
> it works. This isn't really an Openstack issue, seems to be a layer below.
>
> Depending on how mission critical your VM's are, you could probably work
> around it by just passing in  --force-share into the command openstack is
> trying to run.
>
> I cannot recommend this path, the best way is to find out how you remove
> the lock.
>
>
>
>
>
>
> On Thu, Jul 11, 2019 at 3:23 PM Gökhan IŞIK <skylightcoder at gmail.com>
> wrote:
>
>> In [1] it says "Image locking is added and enabled by default. Multiple
>> QEMU processes cannot write to the same image as long as the host supports
>> OFD or posix locking, unless options are specified otherwise." May be need
>> to do something on nova side.
>>
>> I run this command and get same error. Output is in
>> http://paste.openstack.org/show/754311/
>>
>> İf I run qemu-img info instance-0000219b with -U , it doesn't give any
>> errors.
>>
>> [1] https://wiki.qemu.org/ChangeLog/2.10
>>
>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 22:11 tarihinde
>> şunu yazdı:
>>
>>> Well that is interesting. If you look in your libvirt config directory
>>> (/etc/libvirt on Centos) you can get a little more info on what is being
>>> used for locking.
>>>
>>> Maybe strace can shed some light on it. Try something like
>>>
>>> strace -ttt -f qemu-img info
>>> /var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 11, 2019 at 2:39 PM Gökhan IŞIK <skylightcoder at gmail.com>
>>> wrote:
>>>
>>>> I run virsh list --all command and output is below:
>>>>
>>>> root at compute06:~# virsh list --all
>>>>  Id    Name                           State
>>>> ----------------------------------------------------
>>>>  -     instance-000012f9              shut off
>>>>  -     instance-000013b6              shut off
>>>>  -     instance-000016fb              shut off
>>>>  -     instance-0000190a              shut off
>>>>  -     instance-00001a8a              shut off
>>>>  -     instance-00001e05              shut off
>>>>  -     instance-0000202a              shut off
>>>>  -     instance-00002135              shut off
>>>>  -     instance-00002141              shut off
>>>>  -     instance-000021b6              shut off
>>>>  -     instance-000021ec              shut off
>>>>  -     instance-000023db              shut off
>>>>  -     instance-00002ad7              shut off
>>>>
>>>> And also when I try start instances with virsh , output is below:
>>>>
>>>> root at compute06:~# virsh start instance-0000219b
>>>> error: Failed to start domain instance-000012f9
>>>> error: internal error: process exited while connecting to monitor:
>>>>  2019-07-11T18:36:34.229534Z qemu-system-x86_64: -chardev
>>>> pty,id=charserial0,logfile=/dev/fdset/2,logappend=on: char device
>>>> redirected to /dev/pts/3 (label charserial0)
>>>> 2019-07-11T18:36:34.243395Z qemu-system-x86_64: -drive
>>>> file=/var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,discard=ignore:
>>>> Failed to get "write" lock
>>>> Is another process using the image?
>>>>
>>>> Thanks,
>>>> Gökhan
>>>>
>>>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 21:06 tarihinde
>>>> şunu yazdı:
>>>>
>>>>> Can you ssh to the hypervisor and run virsh list to make sure your
>>>>> instances are in fact down?
>>>>>
>>>>> On Thu, Jul 11, 2019 at 3:02 AM Gökhan IŞIK <skylightcoder at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Can anyone help me please ? I can no't rescue my instances yet :(
>>>>>>
>>>>>> Thanks,
>>>>>> Gökhan
>>>>>>
>>>>>> Gökhan IŞIK <skylightcoder at gmail.com>, 9 Tem 2019 Sal, 15:46
>>>>>> tarihinde şunu yazdı:
>>>>>>
>>>>>>> Hi folks,
>>>>>>> Because of power outage, Most of our compute nodes  unexpectedly
>>>>>>> shut  down and now I can not start our instances.  Error message is "Failed
>>>>>>> to get "write" lock another process using the image?". Instances Power
>>>>>>> status is No State.  Full error log is
>>>>>>> http://paste.openstack.org/show/754107/. My environment is
>>>>>>> OpenStack Pike on Ubuntu 16.04 LTS servers and Instances are on a nfs
>>>>>>> shared storage. Nova version is 16.1.6.dev2. qemu version is 2.10.1.
>>>>>>> libvirt version is 3.6.0. I saw a commit [1], but it doesn't solve this
>>>>>>> problem.
>>>>>>> There are important instances on my environment. How can I rescue my
>>>>>>> instances? What would you suggest ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gökhan
>>>>>>>
>>>>>>> [1] https://review.opendev.org/#/c/509774/
>>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190711/7fe88264/attachment.html>


More information about the openstack-discuss mailing list