[openstack-dev] [nova] BUG? nova-compute should delete unused instance files on boot

Joshua Harlow harlowja at yahoo-inc.com
Mon Oct 7 23:30:46 UTC 2013


A scenario that I've seen:

Take 'nova-compute' down for software upgrade, API still accessible since
you want to provide API uptime (aka not taking the whole cluster offline).

User Y deletes VM on that hypervisor where nova-compute is currently down,
DB locally deletes, at this point VM 'A' is still active but nova thinks
its not.

User X requests VM, gets allocated 'A' hostname, ip (or other resource).
Uh oh, u now have two of the same IP/hostname in the same network (second
'A' is in damaged state since it likely can't even 'ifup' its network).

Upgrade software on that hypervisor (yum install xyz...), service
nova-compute restart (back to normal). Now the first 'A' could get
deleted, but u still have second broken 'A'.

Now what?

On 10/7/13 4:21 PM, "Vishvananda Ishaya" <vishvananda at gmail.com> wrote:

>
>On Oct 7, 2013, at 3:49 PM, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
>
>> This brings up another question, do people actually like/use the
>> 'local_delete' feature in nova?
>> 
>> In general it seems to free resources that are not actually capable of
>> being freed and has been problematic for y! usage.
>> 
>> Deleting from the DB allows another request to actually take those
>> resources over, yet the previous VM (+network,volumes...) that wasn't
>> deleted still has those resources (likely attached to it in the case of
>>a
>> volume, or in the case of a hypervisor the VM resource is still active,
>> but maybe nova-compute is down) so u end up in a conflict. How are
>>others
>> using this code? Has it been working out?
>
>We haven't had any trouble with the two settings set as below. Users seem
>to get far more frustrated when they have things that they cannot delete,
>especially when it is using up their precious quota.
>
>Vish
>
>> 
>> -Josh
>> 
>> On 10/7/13 3:34 PM, "Vishvananda Ishaya" <vishvananda at gmail.com> wrote:
>> 
>>> There is a configuration option stating what to do with instances that
>>> are still in the hypervisor but have been deleted from the database. I
>>> think you want:
>>> 
>>> running_deleted_instance_action=reap
>>> 
>>> You probably also want
>>> 
>>> resume_guests_state_on_host_boot=true
>>> 
>>> to bring back the instances that were running before the node was
>>>powered
>>> off. We should definitely consider changing the default of these two
>>> values since I think the default values are probably not what most
>>>people
>>> would want.
>>> 
>>> Vish
>>> On Oct 7, 2013, at 1:24 PM, Chris Friesen <chris.friesen at windriver.com>
>>> wrote:
>>> 
>>>> On 10/07/2013 12:44 PM, Russell Bryant wrote:
>>>>> On 10/07/2013 02:28 PM, Chris Friesen wrote:
>>>>>> 
>>>>>> I've been doing a lot of instance creation/deletion/evacuate and
>>>>>>I've
>>>>>> noticed that if I
>>>>>> 
>>>>>> 1)create an instance
>>>>>> 2) power off the compute node it was running on
>>>>>> 3) delete the instance
>>>>>> 4) boot up the compute node
>>>>>> 
>>>>>> then the instance rootfs stays around in /var/lib/nova/instances/.
>>>>>> Eventually this could add up to significant amounts of space.
>>>>>> 
>>>>>> 
>>>>>> Is this expected behaviour?  (This is on grizzly, so maybe havana is
>>>>>> different.)  If not, should I file a bug for it?
>>>>>> 
>>>>>> I think it would make sense for the compute node to come up, query
>>>>>>all
>>>>>> the instances in /var/lib/nova/instances/, and delete the ones for
>>>>>> instances that aren't in the database.
>>>>> 
>>>>> How long are you waiting after starting up the compute node?  I would
>>>>> expect it to get cleaned up by a periodic task, so you might have to
>>>>> wait roughly 10 minutes (by default).
>>>> 
>>>> This is nearly 50 minutes after booting up the compute node:
>>>> 
>>>> cfriesen at compute2:/var/lib/nova/instances$ ls -1
>>>> 39e459b1-3878-41db-aaaf-7c7d0dfa2b19
>>>> 41a60975-d6b8-468e-90bc-d7de58c2124d
>>>> 46aec2ae-b6de-4503-a238-af736f81f1a4
>>>> 50ec3d89-1c9d-4c28-adaf-26c924dfa3ed
>>>> _base
>>>> c6ec71a3-658c-4c7c-aa42-cc26296ce7fb
>>>> c72845e9-0d34-459f-b602-bb2ee409728b
>>>> compute_nodes
>>>> locks
>>>> 
>>>> Of these, only two show up in "nova list".
>>>> 
>>>> Chris
>>>> 
>>>> 
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> 
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>




More information about the OpenStack-dev mailing list