[Openstack-operators] Can't reboot instance because the source Image was deleted - Grizzly - Ubuntu 12.04
Juan José Pavlik Salles
jjpavlik at gmail.com
Tue May 27 21:19:48 UTC 2014
I'm not proud of this... somehow George was right. Last week we migrated
our instances from gfs2 volume to ocfs2 one and we copied "all" the files
from one volume to the other, we mounted the new one and started the VMs.
BUT... it seems that a few files were lost during the last node failure and
the files that were supposed to be in _base dir weren't there (this is an
awkward answer I'll have to improve before telling my boss about this). You
can see it here:
root at cebolla:/var/lib/nova# ll instances/_base/ instances_17_05_2014/_base/
instances_17_05_2014/_base/:
total 6572308
drwxr-xr-x 2 nova nova 4096 may 17 20:50 ./
drwxr-xr-x 27 root root 4096 may 17 20:57 ../
-rw-r--r-- 1 nova kvm 2147483648 may 17 20:50
1cfaaa19259a9538efb89dd674645af7ad334322
-rw-r--r-- 1 nova kvm 2147483648 may 17 20:50
6a861f8328e7fd0b4bd80bf95dbb7fd2b782e0bd
-rw-r--r-- 1 nova kvm 2147483648 may 17 20:50
99edbbef0de23ac4ed20015ded60887690444661
-rw-r--r-- 1 nova kvm 2147483648 may 17 20:50
d04d963a4efa93ecacaadc272ab841c1dd901c9f
-rw-r--r-- 1 nova nova 8589934592 nov 18 2013 swap
-rw-r--r-- 1 libvirt-qemu kvm 536870912 nov 15 2013 swap_512
instances/_base/:
total 2424832
drwxr-xr-x 2 nova nova 3896 may 27 18:02 ./
drwxr-xr-x 28 nova nova 3896 may 27 17:45 ../
-rw-r--r-- 1 nova nova 2147483648 may 27 17:34
1cfaaa19259a9538efb89dd674645af7ad334322
-rw-r--r-- 1 nova nova 8589934592 nov 18 2013 swap
-rw-r--r-- 1 libvirt-qemu kvm 536870912 nov 15 2013 swap_512
root at cebolla:/var/lib/nova#
Before that I checked that the qcow disk of the instances were being backed
up by a file that didn't exist at all!!!:
root at cebolla:/var/lib/nova/instances/b17bfae2-27b4-49a4-9d1b-bd739b400347#
qemu-img info disk
image: disk
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 2.6G
cluster_size: 65536
backing file:
*/var/lib/nova/instances/_base/99edbbef0de23ac4ed20015ded60887690444661*(actual
path:
/var/lib/nova/instances/_base/99edbbef0de23ac4ed20015ded60887690444661)
root at cebolla:/var/lib/nova/instances/b17bfae2-27b4-49a4-9d1b-bd739b400347#
Basically, I copied the missing files from the older volume
(6a861f8328e7fd0b4bd80bf95dbb7fd2b782e0bd,
99edbbef0de23ac4ed20015ded60887690444661 and
d04d963a4efa93ecacaadc272ab841c1dd901c9f) and started the VMs. Everything
is up and running again, sorry about the incovenients and thanks!!!
2014-05-27 17:35 GMT-03:00 Juan José Pavlik Salles <jjpavlik at gmail.com>:
> What if I change Image ID in glance DB for an existing image's ID? As far
> as I see, if you delete an image you can't reboot the instances that were
> created with that image, doesn't sound fine. I must be loosing something
> here...
>
>
> 2014-05-27 16:56 GMT-03:00 Juan José Pavlik Salles <jjpavlik at gmail.com>:
>
> Great, now I understand that, new thing learned hahah! But this problem
>> doesn't seem to be related with the _base files, the log says it couldn't
>> found the Image file, that's why I'm confused and don't see the point. I'll
>> try spying the code a bit, maybe it's a simple check and there's no real
>> need of the image file.
>>
>>
>> 2014-05-27 16:29 GMT-03:00 George Shuklin <george.shuklin at gmail.com>:
>>
>> _base contains 'base' copy of disk, if disk is in qcow format.
>>>
>>> Qcow consists from basic (unmodified) image and file with changes. If
>>> instance never write to some area, it will be read from base copy. As soon
>>> it write something there, new data will be read from disk, not from _base.
>>>
>>>
>>>
>>> On 05/27/2014 10:18 PM, Juan José Pavlik Salles wrote:
>>>
>>> Hi George, I don't really understand the relationship between _base and
>>> the b17bfae2-27b4-49a4-9d1b-bd739b400347 (instance directory, where the
>>> disks are), this is what _base contains
>>>
>>> root at cebolla:/var/lib/nova/instances# ll _base/
>>> total 2424832
>>> drwxr-xr-x 2 nova nova 3896 may 27 15:23 ./
>>> drwxr-xr-x 28 nova nova 3896 may 27 14:36 ../
>>> -rw-r--r-- 1 nova kvm 2147483648 may 27 15:52
>>> 1cfaaa19259a9538efb89dd674645af7ad334322
>>> -rw-r--r-- 1 nova nova 8589934592 nov 18 2013 swap
>>> -rw-r--r-- 1 libvirt-qemu kvm 536870912 nov 15 2013 swap_512
>>> root at cebolla:/var/lib/nova/instances#
>>>
>>> And I've checked glance DB and
>>> the 39baad54-6ce1-4f42-b431-1bac4fd6df28 register is indeed marked as
>>> deleted and the file is gone:
>>>
>>> root at acelga:/var/lib/glance# ls images
>>> 37a88684-f1d8-472a-8681-65eb047c2100
>>> c94ee2f6-fae5-451c-9633-18c33ec512de d21dd4db-389c-4f4c-a749-91acc1262652
>>> root at acelga:/var/lib/glance#
>>>
>>> Is there any healthy way to start the instances without this lost
>>> image? Do I really need the image to start the instances?
>>>
>>> Thanks
>>>
>>>
>>> 2014-05-27 15:58 GMT-03:00 George Shuklin <george.shuklin at gmail.com>:
>>>
>>>> I think nova checking if image is in place and available to restore
>>>> image _base (if it missing). But if _base is fine, I think it's strange to
>>>> complain about glance images...
>>>>
>>>>
>>>> On 05/27/2014 09:32 PM, Juan José Pavlik Salles wrote:
>>>>
>>>> Hi guys, today we found out that one of our compute nodes had
>>>> rebooted durning the night, so when i got to the office I started rebooting
>>>> the instances but... they never started. After a quite a few reboots I saw
>>>> the light at the end of the tunnel...
>>>>
>>>> 2014-05-27 15:23:45.002 ERROR nova.compute.manager
>>>> [req-a76d922e-4aaa-4357-83cb-5e5a1869b5cc 31020076174943bdb7486c330a298d93
>>>> d1e3aae242f14c488d2225dcbf1e96d6] [instance:
>>>> b17bfae2-27b4-49a4-9d1b-bd739b400347] Cannot reboot instance: Image
>>>> 39baad54-6ce1-4f42-b431-1bac4fd6df28 could not be found.
>>>>
>>>> I've got 3 instances with this same error, all of them were created
>>>> from the same glance image which is not longer among us (replaced for a new
>>>> one). My question is, why do the instance need the image to start? The
>>>> instance disks are there
>>>>
>>>> root at cebolla:/var/lib/nova# ll
>>>> instances/b17bfae2-27b4-49a4-9d1b-bd739b400347/
>>>> total 3233792
>>>> drwxr-xr-x 2 nova nova 3896 feb 20 12:49 ./
>>>> drwxr-xr-x 28 nova nova 3896 may 27 14:36 ../
>>>> -rw-rw---- 1 root root 0 may 27 15:23 console.log
>>>> -rw-r--r-- 1 root root 2773155840 may 24 20:23 disk
>>>> -rw-r--r-- 1 root root 537198592 may 16 16:14 disk.swap
>>>> -rw-r--r-- 1 nova nova 1782 may 27 15:23 libvirt.xml
>>>> root at cebolla:/var/lib/nova#
>>>>
>>>> Any ideas will be more than apreciated.
>>>>
>>>> Thanks guys!
>>>>
>>>> --
>>>> Pavlik Salles Juan José
>>>> Blog - http://viviendolared.blogspot.com
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-operators mailing listOpenStack-operators at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>>
>>>
>>> --
>>> Pavlik Salles Juan José
>>> Blog - http://viviendolared.blogspot.com
>>>
>>>
>>>
>>
>>
>> --
>> Pavlik Salles Juan José
>> Blog - http://viviendolared.blogspot.com
>>
>
>
>
> --
> Pavlik Salles Juan José
> Blog - http://viviendolared.blogspot.com
>
--
Pavlik Salles Juan José
Blog - http://viviendolared.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140527/f4be3c90/attachment.html>
More information about the OpenStack-operators
mailing list