[openstack-dev] Revert Pass instance host-id to Quantum using port bindings extension.
Aaron Rosen
arosen at nicira.com
Fri Jul 19 18:58:12 UTC 2013
On Fri, Jul 19, 2013 at 8:47 AM, Kyle Mestery (kmestery) <kmestery at cisco.com
> wrote:
> On Jul 18, 2013, at 5:16 PM, Aaron Rosen <arosen at nicira.com> wrote:
> >
> > Hi,
> >
> > I wanted to raise another design failure of why creating the port on
> nova-compute is bad. Previously, we have encountered this bug (
> https://bugs.launchpad.net/neutron/+bug/1160442). What was causing the
> issue was that when nova-compute calls into quantum to create the port;
> quantum creates the port but fails to return the port to nova and instead
> timesout. When this happens the instance is scheduled to be run on another
> compute node where another port is created with the same device_id and when
> the instance boots it will look like it has two ports. This is still a
> problem that can occur today in our current implementation (!).
> >
> > I think in order to move forward with this we'll need to compromise.
> Here is my though on how we should proceed.
> >
> > 1) Modify the quantum API so that mac addresses can now be updated via
> the api. There is no reason why we have this limitation (especially once
> the patch that uses dhcp_release is merged as it will allow us to update
> the lease for the new mac immediately). We need to do this in order for
> bare metal support as we need to match the mac address of the port to the
> compute node.
> >
> I don't understand how this relates to creating a port through
> nova-compute. I'm not saying this is a bad idea, I just don't see how it
> relates to the original discussion point on this thread around Yong's patch.
>
> > 2) move the port-creation from nova-compute to nova-api. This will solve
> a number of issues like the one i pointed out above.
> >
> This seems like a bad idea. So now a Nova API call will implicitly create
> a Neutron port? What happens on failure here? The caller isn't aware the
> port was created in Neutron if it's implicit, so who cleans things up? Or
> if the caller is aware, than all we've done is move an API the caller would
> have done (nova-compute in this case) into nova-api, though the caller is
> now still aware of what's happening.
>
On failure here the VM will go to ERROR state if the port is failed to
create in quantum. Then when deleting the instance; the delete code should
also search quantum for the device_id in order to remove the port there as
well.
The issue here is that if an instance fails to boot on a compute node
(because nova-compute did not get the port-create response from quantum and
the port was actually created) the instance gets scheduled to be booted on
another nova-compute node where the duplicate create happens. Moving the
creation to the API node removes the port from getting created in the retry
logic that solves this.
>
> > 3) For now, i'm okay with leaving logic on the compute node that calls
> update-port if the port binding extension is loaded. This will allow the
> vif type to be correctly set as well.
> >
> And this will also still pass in the hostname the VM was booted on?
>
> In this case there would have to be an update-port call done on the
compute node which would set the hostname (which is the same case as live
migration).
> To me, this thread seems to have diverged a bit from the original
> discussion point around Yong's patch. Yong's patch makes sense, because
> it's passing the hostname the VM is booted on during port create. It also
> updates the binding during a live migration, so that case is covered. Any
> change to this behavior should cover both those cases and not involve any
> sort of agent polling, IMHO.
>
> Thanks,
> Kyle
>
> > Thoughts/Comments?
> >
> > Thanks,
> >
> > Aaron
> >
> >
> > On Mon, Jul 15, 2013 at 2:45 PM, Aaron Rosen <arosen at nicira.com> wrote:
> >
> >
> >
> > On Mon, Jul 15, 2013 at 1:26 PM, Robert Kukura <rkukura at redhat.com>
> wrote:
> > On 07/15/2013 03:54 PM, Aaron Rosen wrote:
> > >
> > >
> > >
> > > On Sun, Jul 14, 2013 at 6:48 PM, Robert Kukura <rkukura at redhat.com
> > > <mailto:rkukura at redhat.com>> wrote:
> > >
> > > On 07/12/2013 04:17 PM, Aaron Rosen wrote:
> > > > Hi,
> > > >
> > > >
> > > > On Fri, Jul 12, 2013 at 6:47 AM, Robert Kukura <
> rkukura at redhat.com
> > > <mailto:rkukura at redhat.com>
> > > > <mailto:rkukura at redhat.com <mailto:rkukura at redhat.com>>> wrote:
> > > >
> > > > On 07/11/2013 04:30 PM, Aaron Rosen wrote:
> > > > > Hi,
> > > > >
> > > > > I think we should revert this patch that was added here
> > > > > (https://review.openstack.org/#/c/29767/). What this patch
> > > does is
> > > > when
> > > > > nova-compute calls into quantum to create the port it
> passes
> > > in the
> > > > > hostname on which the instance was booted on. The idea of
> the
> > > > patch was
> > > > > that providing this information would "allow hardware
> device
> > > vendors
> > > > > management stations to allow them to segment the network in
> > > a more
> > > > > precise manager (for example automatically trunk the vlan
> on the
> > > > > physical switch port connected to the compute node on which
> > > the vm
> > > > > instance was started)."
> > > > >
> > > > > In my opinion I don't think this is the right approach.
> > > There are
> > > > > several other ways to get this information of where a
> > > specific port
> > > > > lives. For example, in the OVS plugin case the agent
> running
> > > on the
> > > > > nova-compute node can update the port in quantum to
> provide this
> > > > > information. Alternatively, quantum could query nova using
> the
> > > > > port.device_id to determine which server the instance is
> on.
> > > > >
> > > > > My motivation for removing this code is I now have the free
> > > cycles to
> > > > > work on
> > > > >
> > > >
> > >
> https://blueprints.launchpad.net/nova/+spec/nova-api-quantum-create-port
> > > > > discussed here
> > > > >
> > > >
> > > (
> http://lists.openstack.org/pipermail/openstack-dev/2013-May/009088.html)
> > > > .
> > > > > This was about moving the quantum port creation from the
> > > nova-compute
> > > > > host to nova-api if a network-uuid is passed in. This will
> > > allow us to
> > > > > remove all the quantum logic from the nova-compute nodes
> and
> > > > > simplify orchestration.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Aaron,
> > > >
> > > > The ml2-portbinding BP I am currently working on depends on
> > > nova setting
> > > > the binding:host_id attribute on a port before accessing
> > > > binding:vif_type. The ml2 plugin's MechanismDrivers will use
> the
> > > > binding:host_id with the agents_db info to see what (if any)
> > > L2 agent is
> > > > running on that host, or what other networking mechanisms
> > > might provide
> > > > connectivity for that host. Based on this, the port's
> > > binding:vif_type
> > > > will be set to the appropriate type for that agent/mechanism.
> > > >
> > > > When an L2 agent is involved, the associated ml2
> > > MechanismDriver will
> > > > use the agent's interface or bridge mapping info to determine
> > > whether
> > > > the agent on that host can connect to any of the port's
> network's
> > > > segments, and select the specific segment (network_type,
> > > > physical_network, segmentation_id) to be used. If there is no
> > > > connectivity possible on the host (due to either no L2 agent
> > > or other
> > > > applicable mechanism, or no mapping for any of the network's
> > > segment's
> > > > physical_networks), the ml2 plugin will set the
> binding:vif_type
> > > > attribute to BINDING_FAILED. Nova will then be able to
> > > gracefully put
> > > > the instance into an error state rather than have the
> instance
> > > boot
> > > > without the required connectivity.
> > > >
> > > > I don't see any problem with nova creating the port before
> > > scheduling it
> > > > to a specific host, but the binding:host_id needs to be set
> > > before the
> > > > binding:vif_type attribute is accessed. Note that the host
> > > needs to be
> > > > determined before the vif_type can be determined, so it is
> not
> > > possible
> > > > to rely on the agent discovering the VIF, which can't be
> > > created until
> > > > the vif_type is determined.
> > > >
> > > >
> > > > So what your saying is the current workflow is this: nova-compute
> > > > creates a port in quantum passing in the host-id (which is the
> > > hostname
> > > > of the compute host). Now quantum looks in the agent table in
> it's
> > > > database to determine the VIF type that should be used based on
> the
> > > > agent that is running on the nova-compute node?
> > >
> > > Most plugins just return a hard-wired value for binding:vif_type.
> The
> > > ml2 plugin supports heterogeneous deployments, and therefore needs
> more
> > > flexibility, so this is whats being implemented in the agent-based
> ml2
> > > mechanism drivers. Other mechanism drivers (i.e. controller-based)
> would
> > > work differently. In addition to VIF type selection, port binding
> in ml2
> > > also involves determining if connectivity is possible, and
> selecting the
> > > network segment to use, and these are also based on
> binding:host_id.
> > >
> > >
> > > Can you go into more details about what you mean by heterogeneous
> > > deployments (i.e what the topology looks like)? Why would connectivity
> > > not be possible? I'm confused why things would be configured in such a
> > > way where the scheduler wants to launch an instance on a node where
> > > quantum is not able to provide connectivity for.
> >
> > By heterogeneous deployment, I meant that all compute nodes are not
> > necessarily identically configured. Some might be running the
> > openvswitch agent, some the linuxbridge agent, and some the hyperv
> > agent, but all able to access VLANs on (some of) the same trunks.
> >
> > One example of connectivity not being possible would be if multiple VLAN
> > trunks are in use in the datacenter, but not all compute nodes have
> > connections to every trunk.
> >
> > I agree the scheduler should ensure connectivity will be possible. But
> > mechanisms such as cells, zones, and flavors can also be used in nova to
> > manage heterogeneity. The ml2 port binding code should ideally never
> > find out the scheduled node does not have connectivity, but we've at
> > least defined what should happen if it does. The main need here though
> > is for the port binding code to select the segment to use.
> >
> > Why does the port binding code select which segment to use? I'm unclear
> why anyone would ever have a deployment with a mix of vlans where things
> are trunked in some places and not in others and neutron would have to keep
> up with that. The part i'm unclear on is how neutron would be expected to
> behave in this type of setup. Say one boots several instances: instance1
> lands on compute1 and neutron puts it on vlan X. Later instance 2 is booted
> and it lands on compute2 on this node vlan X isn't reachable?
> >
> >
> >
> > >
> > >
> > >
> > > > My question
> would
> > > be why
> > > > the nova-compute node doesn't already know which VIF_TYPE it
> should be
> > > > using?
> > >
> > > I guess the thinking was that this knowledge belonged in quantum
> rather
> > > than nova, and thus the GenericVifDriver was introduced in
> grizzly. See
> > > https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver and
> > > https://blueprints.launchpad.net/neutron/+spec/.
> > > vif-plugging-improvements
> > > <
> https://blueprints.launchpad.net/neutron/+spec/vif-plugging-improvements>.
> > >
> > >
> > > Thanks for the links. It seems like the the motivation for this was to
> > > remove the libvirt vif configuration settings from nova and off load
> > > that to quantum via the vif_type param on a port. It seems like when
> > > using a specific plugin that plugin will always returns the same
> > > vif_type to a given node. This configuration option in my opinion looks
> > > best to be handled as part of your deployment automation instead and
> not
> > > baked into quantum ports.
> >
> > For monolithic plugins, returning a fixed vif_type works, but this is
> > not sufficient for ml2.
> >
> > I was happy with the old approach of configuring drivers in nova (via
> > deployment automation ideally), but the decision was made in grizzly to
> > switch to the GenericVifDriver.
> >
> > >
> > > My goal is to reduce the orchestration and complexity between nova and
> > > quantum. Currently, nova-api and nova-compute both call out to quantum
> > > when all of this could be done on the api node (ignoring bare metal for
> > > now as in this case we'd need to do something special to handle
> updating
> > > the mac addresses on those logical ports in quantum).
> >
> > Sounds like the scheduler is going to need to call neutron as well, at
> > least in some cases.
> >
> > Why is this? The only use case I see so far for something other than
> nova-api to call into neutron would be bare metal. I think having neutron
> telling nova which vif type it should be using is really tightly coupling
> nova+quantum integration. I think we should probably reexamine
> https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver as setting
> the libvirt_type from the neutron side seems to be something that the
> sysadmin should configure once and not have to rely on neutron to specify.
> >
> > Thanks,
> >
> > Aaron
> >
> > -Bob
> >
> > >
> > >
> > > -Bob
> > >
> > > >
> > > >
> > > > Back when the port binding extension was originally being
> > > hashed out, I
> > > > had suggested using an explicit bind() operation on port that
> > > took the
> > > > host_id as a parameter and returned the vif_type as a result.
> > > But the
> > > > current attribute-based approach was chosen instead. We could
> > > consider
> > > > adding a bind() operation for the next neutron API revision,
> > > but I don't
> > > > see any reason the current attribute-based binding approach
> > > cannot work
> > > > for now.
> > > >
> > > > -Bob
> > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Aaron
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > OpenStack-dev mailing list
> > > > > OpenStack-dev at lists.openstack.org
> > > <mailto:OpenStack-dev at lists.openstack.org>
> > > > <mailto:OpenStack-dev at lists.openstack.org
> > > <mailto:OpenStack-dev at lists.openstack.org>>
> > > > >
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > OpenStack-dev mailing list
> > > > OpenStack-dev at lists.openstack.org
> > > <mailto:OpenStack-dev at lists.openstack.org>
> > > >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > >
> > >
> > >
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130719/d9def2d3/attachment-0001.html>
More information about the OpenStack-dev
mailing list