[openstack-dev] [nova] Networks are not cleaned up in build failure

Andrew Laski andrew.laski at rackspace.com
Thu Jan 15 17:55:57 UTC 2015


On 01/15/2015 09:33 AM, Brian Haley wrote:
> On 01/14/2015 02:15 PM, Andrew Laski wrote:
>> On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:
>>> Hi All,
>>>
>>> I recently experienced failures getting images from Glance while spawning
>>> instances. This step comes after building the networks in the guild sequence.
>>> When the Glance failure occurred the instance was cleaned up and rescheduled
>>> as expected, but the networks were not cleaned up. On investigation I found
>>> that the cleanup code for the networks is in the compute manager’s
>>> _/do_build_run/_instance() method as follows:
>>>
>>>              # NOTE(comstud): Deallocate networks if the driver wants
>>>              # us to do so.
>>>              if self.driver.deallocate_networks_on_reschedule(instance):
>>>                  self._cleanup_allocated_networks(context, instance,
>>>                          requested_networks)
>>>
>>> The default behavior in for the deallocate_networks_on_schedule() method
>>> defined in ComputeDriver is:
>>>
>>>      def deallocate_networks_on_reschedule(self, instance):
>>>          """Does the driver want networks deallocated on reschedule?"""
>>>          return False
>>>
>>> Only the Ironic driver over rides this method to return True, so I think this
>>> means the networks will not be cleaned up for any other virt driver.
>>>
>>>   
>>>
>>> Is this really the desired behavior?
>>>
>> Yes.  Other than when using Ironic there is nothing specific to a particular
>> host in the networking setup.  This means it is not necessary to deallocate and
>> reallocate networks when an instance is rescheduled, so we can avoid the
>> unnecessary work of doing it.
> That's either not true any more, or not true when DVR is enabled in Neutron,
> since in this case the port['binding:host_id'] value has been initialized to a
> compute node, and won't get updated when nova-conductor re-schedules the VM
> elsewhere.
>
> This causes the neutron port to stay on the original compute node, and any
> neutron operations (like floatingip-associate) happen on the "old" port, leaving
> the VM unreachable.

Gotcha.  Then we should be rebinding that port on a reschedule or go 
back to de/reallocating.  I'm assuming there's some way to handle the 
port being moved or resizes would be broken for the same reason.

If we do need to move back to de/reallocation of networks I think it 
would be better to remove the conditional nature of it and just do it.  
If the deallocate_networks_on_reschedule method defaults to True I don't 
see a case where it would be overridden by a driver given the 
information above.

>> If the instance goes to ERROR then the network will get cleaned up when the
>> instance is deleted.
> I think we need to clean-up even in this case now too.
>
> -Brian
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list