[openstack-dev] [Neutron] [xen] [ovs]: how to handle the ports when there are multiple ports existing for a single VM vif
jianghua.wang at citrix.com
Wed Oct 14 07:58:03 UTC 2015
The problem to configure both tapx.0 and vifx.0 is that the iface-id was thought to be unique for each port and all the ports are indexed with iface-id. But in our case, both tapx.0 and vifx.0 share the same iface-id. I'm thinking to use the port's name as the identification as the name seems unique on the ovs bridge. Could any experts help to confirm if there is any potential issue?
And another idea I'm thinking of is: the ifac-id is unique for each "active" port; so one potential resolution is continue to use iface-id as active ports but treat the inactive ports as the subsidiary part to the active port. And add function to sync the configuration to inactive ports once any update on the active port.
Any comments are welcome and appreciated.
Date: Mon, 12 Oct 2015 16:12:23 +0000
From: Jianghua Wang <jianghua.wang at citrix.com>
To: "openstack-dev at lists.openstack.org"
<openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [Neutron] [xen] [ovs]: how to handle the
ports when there are multiple ports existing for a single VM vif
<382648C81696DA498287D6CE037FBF6A0C3E20 at SINPEX01CL02.citrite.net>
Content-Type: text/plain; charset="us-ascii"
I'm working on a bug #1268955 which is due to neutron ovs agent/plugin can't process the ports correctly when multiple ports existing for a single VM vif. I originally identified two potential solutions but one of them requires not minor change; and the other one may result in a race condition. So I'm posting it at here to seek help. Please let me know if you have any comments or advice. Thanks in advance.
When the guest VM is running under HVM mode, neutron doesn't set the vlan tag to the proper port. So guest VM lost network communication.
When VM is under HVM mode, ovs will create two ports and two interfaces for a single vif inside the VM: If the domID is x, one port/interface is named as tapx.0
which is qemu-emulated NIC, used when no PV drivers installed; The other one is named as vifx.0 which is the xen network frontend NIC, used when VM has PV drivers installed. Depending on the PV driver's existing, either port/interface may be used. But current ovs agent/plugin use the VM's vif id(iface-id) to identify the port. So depending on the ports sequence retrieved from ovs; only one port will be processed by neutron. Then the network problem occurs if the finally used port is not the same one processed by neutron (e.g. set vlan tag).
Two of my potential solutions:
1. configure both ports regardless which port will be used finally; so both have the same configuration. It should be able to resolve the problem. But the existing code uses the iface-id as the key for each port. Both tapx.0 and vifx.0 have the same iface-id. With this solution, I have to change the data structure to hold both ports and change relative functions; such required change spreads at many places. So it will take much more effort by comparing to the 2nd choice. And I have a concern if there will be potential issues to configure the inactive port although I can't point it out currently.
2. if there are multiple choices, ovs set the field of "iface-status" as active for the one taking effective; and others will be inactive. So the other solution is to return the active one only. If there is any switchover happens in later phase, treat this port as updated and then it will configure the new chosen port accordingly. In this way it will ensure the active port to be configured properly. The needed change is very limited. Please see the draft patch set for this solution: https://review.openstack.org/#/c/233498/
But the problem is it will introduce a race condition. E.g. if it sets tag on tapx.0 firstly; the guest VM get connection via tapx.0; then the PM driver loaded, so the active port switch to vifx.0; but depending on the neutron agent polling interval, the vifx.0 may not be tagged for a while; then during this period the connection is lost.
Could you share your insights? Thanks a lot.
More information about the OpenStack-dev