[openstack-dev] [nova] Networks are not cleaned up in build failure

Brian Haley brian.haley at hp.com
Thu Jan 15 20:01:52 UTC 2015


On 01/15/2015 12:55 PM, Andrew Laski wrote:
> On 01/15/2015 09:33 AM, Brian Haley wrote:
>> On 01/14/2015 02:15 PM, Andrew Laski wrote:
>>> On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:
>>>> Hi All,
>>>>
>>>> I recently experienced failures getting images from Glance while spawning
>>>> instances. This step comes after building the networks in the guild sequence.
>>>> When the Glance failure occurred the instance was cleaned up and rescheduled
>>>> as expected, but the networks were not cleaned up. On investigation I found
>>>> that the cleanup code for the networks is in the compute manager’s
>>>> _/do_build_run/_instance() method as follows:
>>>>
>>>>              # NOTE(comstud): Deallocate networks if the driver wants
>>>>              # us to do so.
>>>>              if self.driver.deallocate_networks_on_reschedule(instance):
>>>>                  self._cleanup_allocated_networks(context, instance,
>>>>                          requested_networks)
>>>>
>>>> The default behavior in for the deallocate_networks_on_schedule() method
>>>> defined in ComputeDriver is:
>>>>
>>>>      def deallocate_networks_on_reschedule(self, instance):
>>>>          """Does the driver want networks deallocated on reschedule?"""
>>>>          return False
>>>>
>>>> Only the Ironic driver over rides this method to return True, so I think this
>>>> means the networks will not be cleaned up for any other virt driver.
>>>>
>>>>  
>>>> Is this really the desired behavior?
>>>>
>>> Yes.  Other than when using Ironic there is nothing specific to a particular
>>> host in the networking setup.  This means it is not necessary to deallocate and
>>> reallocate networks when an instance is rescheduled, so we can avoid the
>>> unnecessary work of doing it.
>> That's either not true any more, or not true when DVR is enabled in Neutron,
>> since in this case the port['binding:host_id'] value has been initialized to a
>> compute node, and won't get updated when nova-conductor re-schedules the VM
>> elsewhere.
>>
>> This causes the neutron port to stay on the original compute node, and any
>> neutron operations (like floatingip-associate) happen on the "old" port, leaving
>> the VM unreachable.
> 
> Gotcha.  Then we should be rebinding that port on a reschedule or go back to
> de/reallocating.  I'm assuming there's some way to handle the port being moved
> or resizes would be broken for the same reason.
> 
> If we do need to move back to de/reallocation of networks I think it would be
> better to remove the conditional nature of it and just do it.  If the
> deallocate_networks_on_reschedule method defaults to True I don't see a case
> where it would be overridden by a driver given the information above.

Andrew,

I was able to run a test here on a multi-node setup with DVR enabled:

- Booted VM
- Associated floating IP
- Updated binding:host_id (as admin) using the neutron API:

  $ neutron port-update $port -- --binding:host_id=novacompute5

The port was correctly moved to the other compute node and the floating IP
configured.  So that showed me the agents all did the right thing as far as I
can tell.  I know Paul was looking at the nova code to try and update just this
field, I'll check-in with him regarding that so we can get a patch up soon.

-Brian



More information about the OpenStack-dev mailing list