[openstack-dev] [nova][RFC] delete instance device

少合冯 lvmxhster at gmail.com
Mon Nov 30 10:06:04 UTC 2015


Hi all,

I'd like to talk about the delete instance device of nova.

Here is the libvirt doc string to describe it underly function
detachDeviceFlags.
http://paste.openstack.org/show/480330/

It says:

detaching a device from a running domain may be asynchronous.

*and it suggests:*

To check whether the device was successfully removed, either recheck domain

configuration using virDomainGetXMLDesc() or add handler for

VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED


Also Daniel elaborated it, and gave us more some scenarios about it.

it is not guaranteed to succeed. What happens is that the hypervisor
injects an ACPI request to unplug the device. The guest OS must co-operate
by releasing the device, before the hypervisor will complete the action of
physically removing it. So you require a guest OS that supports ACPI unplug
of course, and if the guest is crashed or being malicious there is no
guarantee the unplug will succeed. Libvirt will wait a short while for
success, but you must monitor for libvirt events to see if/when it finally
completes. This delayed release has implications for when Nova can mark the
PCI device as unused and available for other guests to assign.


Now I have checked the code, both detach volume or detach interface call
   5   1220  nova/virt/libvirt/driver.py <<detach_volume>>
             guest . detach_device ( conf , persistent = True , live = live
)
   6   1280  nova/virt/libvirt/driver.py <<detach_interface>>
             guest . detach_device ( cfg , persistent = True , live = live )
   7   3016  nova/virt/libvirt/driver.py <<_detach_pci_devices>>
             guest . detach_device ( self . _get_guest_pci_device ( dev ) ,
live = True )
   8   3105  nova/virt/libvirt/driver.py <<_detach_sriov_ports>>
             guest . detach_device ( cfg , live = True )

And for detach_interface in nova/compute/manager.py:

    @wrap_exception()
    @wrap_instance_fault
    def detach_interface(self, context, instance, port_id):
        """Detach an network adapter from an instance."""
        network_info = instance.info_cache.network_info
        condemned = None
        for vif in network_info:
            if vif['id'] == port_id:
                condemned = vif
                break
        if condemned is None:
            raise exception.PortNotFound(_("Port %s is not "
                                           "attached") % port_id)
        try:
            self.driver.detach_interface(instance, condemned)
        except exception.NovaException as ex:
            LOG.warning(_LW("Detach interface failed, port_id=%(port_id)s,"
                            " reason: %(msg)s"),
                        {'port_id': port_id, 'msg': ex}, instance=instance)
            raise exception.InterfaceDetachFailed(instance_uuid=instance.uuid)
        else:
            try:
                self.network_api.deallocate_port_for_instance(
                    context, instance, port_id)
            except Exception as ex:
                with excutils.save_and_reraise_exception():
                    # Since this is a cast operation, log the failure for
                    # triage.
                    LOG.warning(_LW('Failed to deallocate port %(port_id)s '
                                    'for instance. Error: %(error)s'),
                                {'port_id': port_id, 'error': ex},
                                instance=instance)



It just detach_interface, no double check the device is detached finally.

Now I will support the detach SRIOV code.
https://review.openstack.org/#/c/139910/
I'm not sure should I need to double check the device is finally detached.

If yes.

What should I support?

3 options:
1. Just ignored it. key the nova code.

2. sync check.
as the libvirt suggests: use virDomainGetXMLDesc()

def detach_interface(self, context, instance, port_id):

   self.driver.detach_interface(instance, condemned)

   # just *pseudo-code*

*   for i in range(1, 51):*

      if not(virDomainGetXMLDesc()):

           sleep(1)

       else if i == 51:

           raise exception

       else:

           break

   self.network_api.deallocate_port_for_instance(
       context, instance, port_id)



3. async notification.
 as the libvirt suggests:

add event handler for VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED.


call network_api.deallocate_port_for_instance in a backend task.
The backend receives the event result from event handler by AMPQ and filter
the
device is the the expected interface device, not the volume device.

Then backend call network_api.deallocate_port_for_instance to deallocate
the port.


I have not check the volume detach, not sure it has the same issues.

Beside this issue:
But from the libvirt Doc string,

hypervisors may prevent this operation if there is a currentblock copy
operation on the device being detached;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151130/3b573d28/attachment.html>


More information about the OpenStack-dev mailing list