[Openstack-operators] Can't reboot instance because the source Image was deleted - Grizzly - Ubuntu 12.04

Juan José Pavlik Salles jjpavlik at gmail.com
Tue May 27 21:19:48 UTC 2014


I'm not proud of this... somehow George was right. Last week we migrated
our instances from gfs2 volume to ocfs2 one and we copied "all" the files
from one volume to the other, we mounted the new one and started the VMs.
BUT... it seems that a few files were lost during the last node failure and
the files that were supposed to be in _base dir weren't there (this is an
awkward answer I'll have to improve before telling my boss about this). You
can see it here:

 root at cebolla:/var/lib/nova# ll instances/_base/ instances_17_05_2014/_base/
instances_17_05_2014/_base/:
total 6572308
drwxr-xr-x  2 nova         nova       4096 may 17 20:50 ./
drwxr-xr-x 27 root         root       4096 may 17 20:57 ../
-rw-r--r--  1 nova         kvm  2147483648 may 17 20:50
1cfaaa19259a9538efb89dd674645af7ad334322
-rw-r--r--  1 nova         kvm  2147483648 may 17 20:50
6a861f8328e7fd0b4bd80bf95dbb7fd2b782e0bd
-rw-r--r--  1 nova         kvm  2147483648 may 17 20:50
99edbbef0de23ac4ed20015ded60887690444661
-rw-r--r--  1 nova         kvm  2147483648 may 17 20:50
d04d963a4efa93ecacaadc272ab841c1dd901c9f
-rw-r--r--  1 nova         nova 8589934592 nov 18  2013 swap
-rw-r--r--  1 libvirt-qemu kvm   536870912 nov 15  2013 swap_512

instances/_base/:
total 2424832
drwxr-xr-x  2 nova         nova       3896 may 27 18:02 ./
drwxr-xr-x 28 nova         nova       3896 may 27 17:45 ../
-rw-r--r--  1 nova         nova 2147483648 may 27 17:34
1cfaaa19259a9538efb89dd674645af7ad334322
-rw-r--r--  1 nova         nova 8589934592 nov 18  2013 swap
-rw-r--r--  1 libvirt-qemu kvm   536870912 nov 15  2013 swap_512
root at cebolla:/var/lib/nova#

Before that I checked that the qcow disk of the instances were being backed
up by a file that didn't exist at all!!!:

root at cebolla:/var/lib/nova/instances/b17bfae2-27b4-49a4-9d1b-bd739b400347#
qemu-img info disk
image: disk
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 2.6G
cluster_size: 65536
backing file:
*/var/lib/nova/instances/_base/99edbbef0de23ac4ed20015ded60887690444661*(actual
path:
/var/lib/nova/instances/_base/99edbbef0de23ac4ed20015ded60887690444661)
root at cebolla:/var/lib/nova/instances/b17bfae2-27b4-49a4-9d1b-bd739b400347#

Basically, I copied the missing files from the older volume
(6a861f8328e7fd0b4bd80bf95dbb7fd2b782e0bd,
99edbbef0de23ac4ed20015ded60887690444661 and
d04d963a4efa93ecacaadc272ab841c1dd901c9f) and started the VMs. Everything
is up and running again, sorry about the incovenients and thanks!!!



2014-05-27 17:35 GMT-03:00 Juan José Pavlik Salles <jjpavlik at gmail.com>:

> What if I change Image ID in glance DB for an existing image's ID? As far
> as I see, if you delete an image you can't reboot the instances that were
> created with that image, doesn't sound fine. I must be loosing something
> here...
>
>
> 2014-05-27 16:56 GMT-03:00 Juan José Pavlik Salles <jjpavlik at gmail.com>:
>
> Great, now I understand that, new thing learned hahah! But this problem
>> doesn't seem to be related with the _base files, the log says it couldn't
>> found the Image file, that's why I'm confused and don't see the point. I'll
>> try spying the code a bit, maybe it's a simple check and there's no real
>> need of the image file.
>>
>>
>> 2014-05-27 16:29 GMT-03:00 George Shuklin <george.shuklin at gmail.com>:
>>
>>  _base contains 'base' copy of disk, if disk is in qcow format.
>>>
>>> Qcow consists from basic (unmodified) image and file with changes. If
>>> instance never write to some area, it will be read from base copy. As soon
>>> it write something there, new data will be read from disk, not from _base.
>>>
>>>
>>>
>>> On 05/27/2014 10:18 PM, Juan José Pavlik Salles wrote:
>>>
>>> Hi George, I don't really understand the relationship between _base and
>>> the b17bfae2-27b4-49a4-9d1b-bd739b400347 (instance directory, where the
>>> disks are), this is what _base contains
>>>
>>>  root at cebolla:/var/lib/nova/instances# ll _base/
>>> total 2424832
>>> drwxr-xr-x  2 nova         nova       3896 may 27 15:23 ./
>>> drwxr-xr-x 28 nova         nova       3896 may 27 14:36 ../
>>> -rw-r--r--  1 nova         kvm  2147483648 may 27 15:52
>>> 1cfaaa19259a9538efb89dd674645af7ad334322
>>> -rw-r--r--  1 nova         nova 8589934592 nov 18  2013 swap
>>> -rw-r--r--  1 libvirt-qemu kvm   536870912 nov 15  2013 swap_512
>>> root at cebolla:/var/lib/nova/instances#
>>>
>>>  And I've checked glance DB and
>>> the 39baad54-6ce1-4f42-b431-1bac4fd6df28 register is indeed marked as
>>> deleted and the file is gone:
>>>
>>>  root at acelga:/var/lib/glance# ls images
>>> 37a88684-f1d8-472a-8681-65eb047c2100
>>>  c94ee2f6-fae5-451c-9633-18c33ec512de  d21dd4db-389c-4f4c-a749-91acc1262652
>>> root at acelga:/var/lib/glance#
>>>
>>>  Is there any healthy way to start the instances without this lost
>>> image? Do I really need the image to start the instances?
>>>
>>>  Thanks
>>>
>>>
>>> 2014-05-27 15:58 GMT-03:00 George Shuklin <george.shuklin at gmail.com>:
>>>
>>>>  I think nova checking if image is in place and available to restore
>>>> image _base (if it missing). But if _base is fine, I think it's strange to
>>>> complain about glance images...
>>>>
>>>>
>>>> On 05/27/2014 09:32 PM, Juan José Pavlik Salles wrote:
>>>>
>>>>  Hi guys, today we found out that one of our compute nodes had
>>>> rebooted durning the night, so when i got to the office I started rebooting
>>>> the instances but... they never started. After a quite a few reboots I saw
>>>> the light at the end of the tunnel...
>>>>
>>>>  2014-05-27 15:23:45.002 ERROR nova.compute.manager
>>>> [req-a76d922e-4aaa-4357-83cb-5e5a1869b5cc 31020076174943bdb7486c330a298d93
>>>> d1e3aae242f14c488d2225dcbf1e96d6] [instance:
>>>> b17bfae2-27b4-49a4-9d1b-bd739b400347] Cannot reboot instance: Image
>>>> 39baad54-6ce1-4f42-b431-1bac4fd6df28 could not be found.
>>>>
>>>>  I've got 3 instances with this same error, all of them were created
>>>> from the same glance image which is not longer among us (replaced for a new
>>>> one). My question is, why do the instance need the image to start? The
>>>> instance disks are there
>>>>
>>>>    root at cebolla:/var/lib/nova# ll
>>>> instances/b17bfae2-27b4-49a4-9d1b-bd739b400347/
>>>> total 3233792
>>>> drwxr-xr-x  2 nova nova       3896 feb 20 12:49 ./
>>>> drwxr-xr-x 28 nova nova       3896 may 27 14:36 ../
>>>> -rw-rw----  1 root root          0 may 27 15:23 console.log
>>>> -rw-r--r--  1 root root 2773155840 may 24 20:23 disk
>>>> -rw-r--r--  1 root root  537198592 may 16 16:14 disk.swap
>>>> -rw-r--r--  1 nova nova       1782 may 27 15:23 libvirt.xml
>>>> root at cebolla:/var/lib/nova#
>>>>
>>>>  Any ideas will be more than apreciated.
>>>>
>>>>  Thanks guys!
>>>>
>>>>  --
>>>> Pavlik Salles Juan José
>>>> Blog - http://viviendolared.blogspot.com
>>>>
>>>>
>>>>  _______________________________________________
>>>> OpenStack-operators mailing listOpenStack-operators at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>>
>>>
>>>
>>>  --
>>> Pavlik Salles Juan José
>>> Blog - http://viviendolared.blogspot.com
>>>
>>>
>>>
>>
>>
>> --
>> Pavlik Salles Juan José
>> Blog - http://viviendolared.blogspot.com
>>
>
>
>
> --
> Pavlik Salles Juan José
> Blog - http://viviendolared.blogspot.com
>



-- 
Pavlik Salles Juan José
Blog - http://viviendolared.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140527/f4be3c90/attachment.html>


More information about the OpenStack-operators mailing list