[openstack-dev] [nova] Networks are not cleaned up in build failure
Brian Haley
brian.haley at hp.com
Thu Jan 15 20:01:52 UTC 2015
On 01/15/2015 12:55 PM, Andrew Laski wrote:
> On 01/15/2015 09:33 AM, Brian Haley wrote:
>> On 01/14/2015 02:15 PM, Andrew Laski wrote:
>>> On 01/14/2015 12:57 PM, Murray, Paul (HP Cloud) wrote:
>>>> Hi All,
>>>>
>>>> I recently experienced failures getting images from Glance while spawning
>>>> instances. This step comes after building the networks in the guild sequence.
>>>> When the Glance failure occurred the instance was cleaned up and rescheduled
>>>> as expected, but the networks were not cleaned up. On investigation I found
>>>> that the cleanup code for the networks is in the compute manager’s
>>>> _/do_build_run/_instance() method as follows:
>>>>
>>>> # NOTE(comstud): Deallocate networks if the driver wants
>>>> # us to do so.
>>>> if self.driver.deallocate_networks_on_reschedule(instance):
>>>> self._cleanup_allocated_networks(context, instance,
>>>> requested_networks)
>>>>
>>>> The default behavior in for the deallocate_networks_on_schedule() method
>>>> defined in ComputeDriver is:
>>>>
>>>> def deallocate_networks_on_reschedule(self, instance):
>>>> """Does the driver want networks deallocated on reschedule?"""
>>>> return False
>>>>
>>>> Only the Ironic driver over rides this method to return True, so I think this
>>>> means the networks will not be cleaned up for any other virt driver.
>>>>
>>>>
>>>> Is this really the desired behavior?
>>>>
>>> Yes. Other than when using Ironic there is nothing specific to a particular
>>> host in the networking setup. This means it is not necessary to deallocate and
>>> reallocate networks when an instance is rescheduled, so we can avoid the
>>> unnecessary work of doing it.
>> That's either not true any more, or not true when DVR is enabled in Neutron,
>> since in this case the port['binding:host_id'] value has been initialized to a
>> compute node, and won't get updated when nova-conductor re-schedules the VM
>> elsewhere.
>>
>> This causes the neutron port to stay on the original compute node, and any
>> neutron operations (like floatingip-associate) happen on the "old" port, leaving
>> the VM unreachable.
>
> Gotcha. Then we should be rebinding that port on a reschedule or go back to
> de/reallocating. I'm assuming there's some way to handle the port being moved
> or resizes would be broken for the same reason.
>
> If we do need to move back to de/reallocation of networks I think it would be
> better to remove the conditional nature of it and just do it. If the
> deallocate_networks_on_reschedule method defaults to True I don't see a case
> where it would be overridden by a driver given the information above.
Andrew,
I was able to run a test here on a multi-node setup with DVR enabled:
- Booted VM
- Associated floating IP
- Updated binding:host_id (as admin) using the neutron API:
$ neutron port-update $port -- --binding:host_id=novacompute5
The port was correctly moved to the other compute node and the floating IP
configured. So that showed me the agents all did the right thing as far as I
can tell. I know Paul was looking at the nova code to try and update just this
field, I'll check-in with him regarding that so we can get a patch up soon.
-Brian
More information about the OpenStack-dev
mailing list