[Nova] Instances can't be started after compute nodes unexpectedly shut down because of power outage

Donny Davis donny at fortnebula.com
Thu Jul 11 20:10:43 UTC 2019


You surely want to leave locking turned on.

You may want to ask qemu-devel about the locking of a image file and how it
works. This isn't really an Openstack issue, seems to be a layer below.

Depending on how mission critical your VM's are, you could probably work
around it by just passing in  --force-share into the command openstack is
trying to run.

I cannot recommend this path, the best way is to find out how you remove
the lock.






On Thu, Jul 11, 2019 at 3:23 PM Gökhan IŞIK <skylightcoder at gmail.com> wrote:

> In [1] it says "Image locking is added and enabled by default. Multiple
> QEMU processes cannot write to the same image as long as the host supports
> OFD or posix locking, unless options are specified otherwise." May be need
> to do something on nova side.
>
> I run this command and get same error. Output is in
> http://paste.openstack.org/show/754311/
>
> İf I run qemu-img info instance-0000219b with -U , it doesn't give any
> errors.
>
> [1] https://wiki.qemu.org/ChangeLog/2.10
>
> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 22:11 tarihinde şunu
> yazdı:
>
>> Well that is interesting. If you look in your libvirt config directory
>> (/etc/libvirt on Centos) you can get a little more info on what is being
>> used for locking.
>>
>> Maybe strace can shed some light on it. Try something like
>>
>> strace -ttt -f qemu-img info
>> /var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk
>>
>>
>>
>>
>>
>> On Thu, Jul 11, 2019 at 2:39 PM Gökhan IŞIK <skylightcoder at gmail.com>
>> wrote:
>>
>>> I run virsh list --all command and output is below:
>>>
>>> root at compute06:~# virsh list --all
>>>  Id    Name                           State
>>> ----------------------------------------------------
>>>  -     instance-000012f9              shut off
>>>  -     instance-000013b6              shut off
>>>  -     instance-000016fb              shut off
>>>  -     instance-0000190a              shut off
>>>  -     instance-00001a8a              shut off
>>>  -     instance-00001e05              shut off
>>>  -     instance-0000202a              shut off
>>>  -     instance-00002135              shut off
>>>  -     instance-00002141              shut off
>>>  -     instance-000021b6              shut off
>>>  -     instance-000021ec              shut off
>>>  -     instance-000023db              shut off
>>>  -     instance-00002ad7              shut off
>>>
>>> And also when I try start instances with virsh , output is below:
>>>
>>> root at compute06:~# virsh start instance-0000219b
>>> error: Failed to start domain instance-000012f9
>>> error: internal error: process exited while connecting to monitor:
>>>  2019-07-11T18:36:34.229534Z qemu-system-x86_64: -chardev
>>> pty,id=charserial0,logfile=/dev/fdset/2,logappend=on: char device
>>> redirected to /dev/pts/3 (label charserial0)
>>> 2019-07-11T18:36:34.243395Z qemu-system-x86_64: -drive
>>> file=/var/lib/nova/instances/659b5853-d094-4425-85a9-5bcacf88c84e/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,discard=ignore:
>>> Failed to get "write" lock
>>> Is another process using the image?
>>>
>>> Thanks,
>>> Gökhan
>>>
>>> Donny Davis <donny at fortnebula.com>, 11 Tem 2019 Per, 21:06 tarihinde
>>> şunu yazdı:
>>>
>>>> Can you ssh to the hypervisor and run virsh list to make sure your
>>>> instances are in fact down?
>>>>
>>>> On Thu, Jul 11, 2019 at 3:02 AM Gökhan IŞIK <skylightcoder at gmail.com>
>>>> wrote:
>>>>
>>>>> Can anyone help me please ? I can no't rescue my instances yet :(
>>>>>
>>>>> Thanks,
>>>>> Gökhan
>>>>>
>>>>> Gökhan IŞIK <skylightcoder at gmail.com>, 9 Tem 2019 Sal, 15:46
>>>>> tarihinde şunu yazdı:
>>>>>
>>>>>> Hi folks,
>>>>>> Because of power outage, Most of our compute nodes  unexpectedly
>>>>>> shut  down and now I can not start our instances.  Error message is "Failed
>>>>>> to get "write" lock another process using the image?". Instances Power
>>>>>> status is No State.  Full error log is
>>>>>> http://paste.openstack.org/show/754107/. My environment is OpenStack
>>>>>> Pike on Ubuntu 16.04 LTS servers and Instances are on a nfs shared storage.
>>>>>> Nova version is 16.1.6.dev2. qemu version is 2.10.1. libvirt version is
>>>>>> 3.6.0. I saw a commit [1], but it doesn't solve this problem.
>>>>>> There are important instances on my environment. How can I rescue my
>>>>>> instances? What would you suggest ?
>>>>>>
>>>>>> Thanks,
>>>>>> Gökhan
>>>>>>
>>>>>> [1] https://review.opendev.org/#/c/509774/
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190711/8f5ec2e9/attachment.html>


More information about the openstack-discuss mailing list