[openstack-dev] [Neutron] [xen] [ovs]: how to handle the ports when there are multiple ports existing for a single VM vif
jianghua.wang at citrix.com
Mon Oct 12 16:12:23 UTC 2015
I'm working on a bug #1268955 which is due to neutron ovs agent/plugin can't process the ports correctly when multiple ports existing for a single VM vif. I originally identified two potential solutions but one of them requires not minor change; and the other one may result in a race condition. So I'm posting it at here to seek help. Please let me know if you have any comments or advice. Thanks in advance.
When the guest VM is running under HVM mode, neutron doesn't set the vlan tag to the proper port. So guest VM lost network communication.
When VM is under HVM mode, ovs will create two ports and two interfaces for a single vif inside the VM: If the domID is x, one port/interface is named as tapx.0
which is qemu-emulated NIC, used when no PV drivers installed; The other one is named as vifx.0 which is the xen network frontend NIC, used when VM has PV drivers installed. Depending on the PV driver's existing, either port/interface may be used. But current ovs agent/plugin use the VM's vif id(iface-id) to identify the port. So depending on the ports sequence retrieved from ovs; only one port will be processed by neutron. Then the network problem occurs if the finally used port is not the same one processed by neutron (e.g. set vlan tag).
Two of my potential solutions:
1. configure both ports regardless which port will be used finally; so both have the same configuration. It should be able to resolve the problem. But the existing code uses the iface-id as the key for each port. Both tapx.0 and vifx.0 have the same iface-id. With this solution, I have to change the data structure to hold both ports and change relative functions; such required change spreads at many places. So it will take much more effort by comparing to the 2nd choice. And I have a concern if there will be potential issues to configure the inactive port although I can't point it out currently.
2. if there are multiple choices, ovs set the field of "iface-status" as active for the one taking effective; and others will be inactive. So the other solution is to return the active one only. If there is any switchover happens in later phase, treat this port as updated and then it will configure the new chosen port accordingly. In this way it will ensure the active port to be configured properly. The needed change is very limited. Please see the draft patch set for this solution: https://review.openstack.org/#/c/233498/
But the problem is it will introduce a race condition. E.g. if it sets tag on tapx.0 firstly; the guest VM get connection via tapx.0; then the PM driver loaded, so the active port switch to vifx.0; but depending on the neutron agent polling interval, the vifx.0 may not be tagged for a while; then during this period the connection is lost.
Could you share your insights? Thanks a lot.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev