[openstack-dev] Revert Pass instance host-id to Quantum using port bindings extension.

Aaron Rosen arosen at nicira.com
Fri Jul 19 22:15:14 UTC 2013


On Fri, Jul 19, 2013 at 1:11 PM, Kyle Mestery (kmestery) <kmestery at cisco.com
> wrote:

> On Jul 19, 2013, at 1:58 PM, Aaron Rosen <arosen at nicira.com> wrote:
> >
> >
> >
> >
> > On Fri, Jul 19, 2013 at 8:47 AM, Kyle Mestery (kmestery) <
> kmestery at cisco.com> wrote:
> > On Jul 18, 2013, at 5:16 PM, Aaron Rosen <arosen at nicira.com> wrote:
> > >
> > > Hi,
> > >
> > > I wanted to raise another design failure of why creating the port on
> nova-compute is bad. Previously, we have encountered this bug (
> https://bugs.launchpad.net/neutron/+bug/1160442). What was causing the
> issue was that when nova-compute calls into quantum to create the port;
> quantum creates the port but fails to return the port to nova and instead
> timesout. When this happens the instance is scheduled to be run on another
> compute node where another port is created with the same device_id and when
> the instance boots it will look like it has two ports. This is still a
> problem that can occur today in our current implementation (!).
> > >
> > > I think in order to move forward with this we'll need to compromise.
> Here is my though on how we should proceed.
> > >
> > > 1) Modify the quantum API so that mac addresses can now be updated via
> the api. There is no reason why we have this limitation (especially once
> the patch that uses dhcp_release is merged as it will allow us to update
> the lease for the new mac immediately).  We need to do this in order for
> bare metal support as we need to match the mac address of the port to the
> compute node.
> > >
> > I don't understand how this relates to creating a port through
> nova-compute. I'm not saying this is a bad idea, I just don't see how it
> relates to the original discussion point on this thread around Yong's patch.
> >
> > > 2) move the port-creation from nova-compute to nova-api. This will
> solve a number of issues like the one i pointed out above.
> > >
> > This seems like a bad idea. So now a Nova API call will implicitly
> create a Neutron port? What happens on failure here? The caller isn't aware
> the port was created in Neutron if it's implicit, so who cleans things up?
> Or if the caller is aware, than all we've done is move an API the caller
> would have done (nova-compute in this case) into nova-api, though the
> caller is now still aware of what's happening.
> >
> > On failure here the VM will go to ERROR state if the port is failed to
> create in quantum. Then when deleting the instance; the delete code should
> also search quantum for the device_id in order to remove the port there as
> well.
> >
> So, nova-compute will implicitly know the port was created by nova-api,
> and if a failure happens, it will clean up the port? That doesn't sound
> like a balanced solution to me, and seems to tie nova-compute and nova-api
> close together when it comes to launching VMs with Neutron ports.
>
> >  The issue here is that if an instance fails to boot on a compute node
> (because nova-compute did not get the port-create response from quantum and
> the port was actually created) the instance gets scheduled to be booted on
> another nova-compute node where the duplicate create happens. Moving the
> creation to the API node removes the port from getting created in the retry
> logic that solves this.
> >
> I think Ian's comments on your blueprint [1] address this exact problem,
> can you take a look at them there?
>
> [1]
> https://blueprints.launchpad.net/nova/+spec/nova-api-quantum-create-port


Sure from Ian: comments inline with [arosen] :

 The issue is not the location of the call.

The issue is one of transactionality - you want to create a neutron port
implicitly while nova booting a machine, and you want all the Neutron and
Nova calls to both succeed or both fail. If you can't have transactionality
the old fashioned way with synchronous calls (and we can't) then you need
eventual consistency: a task to clean up dead ports and the understanding
that such ports may still be kicking around from previous attempts.

[arosen] - I agree.

We should create the port, *then* attempt the attach using update - the
create can succeed independently and any subsequent nova-compute attach
will succeed on the previously created port rather than making a new one
(possibly verifying that its 'attached' status, if the second call
completed but didn't return, is a lie).

So:
create fails to return but port is created
-> run on 2nd compute node won't attempt the create, port already exists;
port consumed, everything good

create returns, attach fails to return but port is attached
-> run on 2nd compute node won't attempt the create and will identify that
the attachment state is bogus and overwrite it; port consumed, everything
good
-> if last attempt, a port with a bogus attach is left hanging around in
the DB; a cleanup job has to go looking for it and remove it; optionally
anything else can spot its inconsistency and ignore or remove it. Risk of
removal during the actual scheduling, in which case the schedule pass will
fail; can set expiry time on port.

[arosen] - sure, in this case though then we'll have to add even more
queries between nova-compute and quantum as nova-compute will need to query
quantum for ports matching the device_id to see if the port was already
created and if not try to create them.

create succeeds, attach fails and we get to see that it's failed
-> clean up port

Moving the create may for other reasons be a good idea (because compute
would *always* deal with ports and *never* with networks - a simpler API) -
but it's nothing to do with solving this problem.

[arosen] - It does solve this issue because it moves the quantum
port-create calls outside of the retry schedule logic on that compute node.
Therefore if the port fails to create the instance goes to error state.
 Moving networks out of the nova-api will also solve this issue for us as
the client then won't rely on nova anymore to create the port. I'm
wondering if creating an additional network_api_class like
nova.network.quantumv2.api.NoComputeAPI is the way to prove this out. Most
of the code in there would inherit from nova.network.quantumv2.api.API .

 -- ijw

>
>
> > > 3)  For now, i'm okay with leaving logic on the compute node that
> calls update-port if the port binding extension is loaded. This will allow
> the vif type to be correctly set as well.
> > >
> > And this will also still pass in the hostname the VM was booted on?
> >
> > In this case there would have to be an update-port call done on the
> compute node which would set the hostname (which is the same case as live
> migration).
> >
> Just to be sure I understand, nova-compute will do this or this will be
> the responsibility of some neutron agent?
>
> Nova-compute,  though I'd still argue that's in the wrong place.


> Thanks,
> Kyle
>
> > To me, this thread seems to have diverged a bit from the original
> discussion point around Yong's patch. Yong's patch makes sense, because
> it's passing the hostname the VM is booted on during port create. It also
> updates the binding during a live migration, so that case is covered. Any
> change to this behavior should cover both those cases and not involve any
> sort of agent polling, IMHO.
> >
> > Thanks,
> > Kyle
> >
> > > Thoughts/Comments?
> > >
> > > Thanks,
> > >
> > > Aaron
> > >
> > >
> > > On Mon, Jul 15, 2013 at 2:45 PM, Aaron Rosen <arosen at nicira.com>
> wrote:
> > >
> > >
> > >
> > > On Mon, Jul 15, 2013 at 1:26 PM, Robert Kukura <rkukura at redhat.com>
> wrote:
> > > On 07/15/2013 03:54 PM, Aaron Rosen wrote:
> > > >
> > > >
> > > >
> > > > On Sun, Jul 14, 2013 at 6:48 PM, Robert Kukura <rkukura at redhat.com
> > > > <mailto:rkukura at redhat.com>> wrote:
> > > >
> > > >     On 07/12/2013 04:17 PM, Aaron Rosen wrote:
> > > >     > Hi,
> > > >     >
> > > >     >
> > > >     > On Fri, Jul 12, 2013 at 6:47 AM, Robert Kukura <
> rkukura at redhat.com
> > > >     <mailto:rkukura at redhat.com>
> > > >     > <mailto:rkukura at redhat.com <mailto:rkukura at redhat.com>>>
> wrote:
> > > >     >
> > > >     >     On 07/11/2013 04:30 PM, Aaron Rosen wrote:
> > > >     >     > Hi,
> > > >     >     >
> > > >     >     > I think we should revert this patch that was added here
> > > >     >     > (https://review.openstack.org/#/c/29767/). What this
> patch
> > > >     does is
> > > >     >     when
> > > >     >     > nova-compute calls into quantum to create the port it
> passes
> > > >     in the
> > > >     >     > hostname on which the instance was booted on. The idea
> of the
> > > >     >     patch was
> > > >     >     > that providing this information would "allow hardware
> device
> > > >     vendors
> > > >     >     > management stations to allow them to segment the network
> in
> > > >     a more
> > > >     >     > precise manager (for example automatically trunk the
> vlan on the
> > > >     >     > physical switch port connected to the compute node on
> which
> > > >     the vm
> > > >     >     > instance was started)."
> > > >     >     >
> > > >     >     > In my opinion I don't think this is the right approach.
> > > >     There are
> > > >     >     > several other ways to get this information of where a
> > > >     specific port
> > > >     >     > lives. For example, in the OVS plugin case the agent
> running
> > > >     on the
> > > >     >     > nova-compute node can update the port in quantum to
> provide this
> > > >     >     > information. Alternatively, quantum could query nova
> using the
> > > >     >     > port.device_id to determine which server the instance is
> on.
> > > >     >     >
> > > >     >     > My motivation for removing this code is I now have the
> free
> > > >     cycles to
> > > >     >     > work on
> > > >     >     >
> > > >     >
> > > >
> https://blueprints.launchpad.net/nova/+spec/nova-api-quantum-create-port
> > > >     >     >  discussed here
> > > >     >     >
> > > >     >
> > > >     (
> http://lists.openstack.org/pipermail/openstack-dev/2013-May/009088.html)
> > > >     >      .
> > > >     >     > This was about moving the quantum port creation from the
> > > >     nova-compute
> > > >     >     > host to nova-api if a network-uuid is passed in. This
> will
> > > >     allow us to
> > > >     >     > remove all the quantum logic from the nova-compute nodes
> and
> > > >     >     > simplify orchestration.
> > > >     >     >
> > > >     >     > Thoughts?
> > > >     >
> > > >     >     Aaron,
> > > >     >
> > > >     >     The ml2-portbinding BP I am currently working on depends on
> > > >     nova setting
> > > >     >     the binding:host_id attribute on a port before accessing
> > > >     >     binding:vif_type. The ml2 plugin's MechanismDrivers will
> use the
> > > >     >     binding:host_id with the agents_db info to see what (if
> any)
> > > >     L2 agent is
> > > >     >     running on that host, or what other networking mechanisms
> > > >     might provide
> > > >     >     connectivity for that host. Based on this, the port's
> > > >     binding:vif_type
> > > >     >     will be set to the appropriate type for that
> agent/mechanism.
> > > >     >
> > > >     >     When an L2 agent is involved, the associated ml2
> > > >     MechanismDriver will
> > > >     >     use the agent's interface or bridge mapping info to
> determine
> > > >     whether
> > > >     >     the agent on that host can connect to any of the port's
> network's
> > > >     >     segments, and select the specific segment (network_type,
> > > >     >     physical_network, segmentation_id) to be used. If there is
> no
> > > >     >     connectivity possible on the host (due to either no L2
> agent
> > > >     or other
> > > >     >     applicable mechanism, or no mapping for any of the
> network's
> > > >     segment's
> > > >     >     physical_networks), the ml2 plugin will set the
> binding:vif_type
> > > >     >     attribute to BINDING_FAILED. Nova will then be able to
> > > >     gracefully put
> > > >     >     the instance into an error state rather than have the
> instance
> > > >     boot
> > > >     >     without the required connectivity.
> > > >     >
> > > >     >     I don't see any problem with nova creating the port before
> > > >     scheduling it
> > > >     >     to a specific host, but the binding:host_id needs to be set
> > > >     before the
> > > >     >     binding:vif_type attribute is accessed. Note that the host
> > > >     needs to be
> > > >     >     determined before the vif_type can be determined, so it is
> not
> > > >     possible
> > > >     >     to rely on the agent discovering the VIF, which can't be
> > > >     created until
> > > >     >     the vif_type is determined.
> > > >     >
> > > >     >
> > > >     > So what your saying is the current workflow is this:
> nova-compute
> > > >     > creates a port in quantum passing in the host-id (which is the
> > > >     hostname
> > > >     > of the compute host). Now quantum looks in the agent table in
> it's
> > > >     > database to determine the VIF type that should be used based
> on the
> > > >     > agent that is running on the nova-compute node?
> > > >
> > > >     Most plugins just return a hard-wired value for
> binding:vif_type. The
> > > >     ml2 plugin supports heterogeneous deployments, and therefore
> needs more
> > > >     flexibility, so this is whats being implemented in the
> agent-based ml2
> > > >     mechanism drivers. Other mechanism drivers (i.e.
> controller-based) would
> > > >     work differently. In addition to VIF type selection, port
> binding in ml2
> > > >     also involves determining if connectivity is possible, and
> selecting the
> > > >     network segment to use, and these are also based on
> binding:host_id.
> > > >
> > > >
> > > > Can you go into more details about what you mean by heterogeneous
> > > > deployments (i.e what the topology looks like)? Why would
> connectivity
> > > > not be possible? I'm confused why things would be configured in such
> a
> > > > way where the scheduler wants to launch an instance on a node where
> > > > quantum is not able to provide connectivity for.
> > >
> > > By heterogeneous deployment, I meant that all compute nodes are not
> > > necessarily identically configured. Some might be running the
> > > openvswitch agent, some the linuxbridge agent, and some the hyperv
> > > agent, but all able to access VLANs on (some of) the same trunks.
> > >
> > > One example of connectivity not being possible would be if multiple
> VLAN
> > > trunks are in use in the datacenter, but not all compute nodes have
> > > connections to every trunk.
> > >
> > > I agree the scheduler should ensure connectivity will be possible. But
> > > mechanisms such as cells, zones, and flavors can also be used in nova
> to
> > > manage heterogeneity. The ml2 port binding code should ideally never
> > > find out the scheduled node does not have connectivity, but we've at
> > > least defined what should happen if it does. The main need here though
> > > is for the port binding code to select the segment to use.
> > >
> > > Why does the port binding code select which segment to use? I'm
> unclear why anyone would ever have a deployment with a mix of vlans where
> things are trunked in some places and not in others and neutron would have
> to keep up with that. The part i'm unclear on is how neutron would be
> expected to behave in this type of setup. Say one boots several instances:
> instance1 lands on compute1 and neutron puts it on vlan X. Later instance 2
> is booted and it lands on compute2 on this node vlan X isn't reachable?
> > >
> > >
> > >
> > > >
> > > >
> > > >
> > > >     >                                                  My question
> would
> > > >     be why
> > > >     > the nova-compute node doesn't already know which VIF_TYPE it
> should be
> > > >     > using?
> > > >
> > > >     I guess the thinking was that this knowledge belonged in quantum
> rather
> > > >     than nova, and thus the GenericVifDriver was introduced in
> grizzly. See
> > > >     https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driverand
> > > >     https://blueprints.launchpad.net/neutron/+spec/.
> > > >     vif-plugging-improvements
> > > >     <
> https://blueprints.launchpad.net/neutron/+spec/vif-plugging-improvements>.
> > > >
> > > >
> > > > Thanks for the links. It seems like the the motivation for this was
> to
> > > > remove the libvirt vif configuration settings from nova and off load
> > > > that to quantum via the vif_type param on a port. It seems like when
> > > > using a specific plugin that plugin will always returns the same
> > > > vif_type to a given node. This configuration option in my opinion
> looks
> > > > best to be handled as part of your deployment automation instead and
> not
> > > > baked into quantum ports.
> > >
> > > For monolithic plugins, returning a fixed vif_type works, but this is
> > > not sufficient for ml2.
> > >
> > > I was happy with the old approach of configuring drivers in nova (via
> > > deployment automation ideally), but the decision was made in grizzly to
> > > switch to the GenericVifDriver.
> > >
> > > >
> > > > My goal is to reduce the orchestration and complexity between nova
> and
> > > > quantum. Currently, nova-api and nova-compute both call out to
> quantum
> > > > when all of this could be done on the api node (ignoring bare metal
> for
> > > > now as in this case we'd need to do something special to handle
> updating
> > > > the mac addresses on those logical ports in quantum).
> > >
> > > Sounds like the scheduler is going to need to call neutron as well, at
> > > least in some cases.
> > >
> > > Why is this? The only use case I see so far for something other than
> nova-api to call into neutron would be bare metal. I think having neutron
> telling nova which vif type it should be using is really tightly coupling
> nova+quantum integration. I think we should probably reexamine
> https://blueprints.launchpad.net/nova/+spec/libvirt-vif-driver as setting
> the libvirt_type from the neutron side seems to be something that the
> sysadmin should configure once and not have to rely on neutron to specify.
> > >
> > > Thanks,
> > >
> > > Aaron
> > >
> > > -Bob
> > >
> > > >
> > > >
> > > >     -Bob
> > > >
> > > >     >
> > > >     >
> > > >     >     Back when the port binding extension was originally being
> > > >     hashed out, I
> > > >     >     had suggested using an explicit bind() operation on port
> that
> > > >     took the
> > > >     >     host_id as a parameter and returned the vif_type as a
> result.
> > > >     But the
> > > >     >     current attribute-based approach was chosen instead. We
> could
> > > >     consider
> > > >     >     adding a bind() operation for the next neutron API
> revision,
> > > >     but I don't
> > > >     >     see any reason the current attribute-based binding approach
> > > >     cannot work
> > > >     >     for now.
> > > >     >
> > > >     >     -Bob
> > > >     >
> > > >     >     >
> > > >     >     > Best,
> > > >     >     >
> > > >     >     > Aaron
> > > >     >     >
> > > >     >     >
> > > >     >     > _______________________________________________
> > > >     >     > OpenStack-dev mailing list
> > > >     >     > OpenStack-dev at lists.openstack.org
> > > >     <mailto:OpenStack-dev at lists.openstack.org>
> > > >     >     <mailto:OpenStack-dev at lists.openstack.org
> > > >     <mailto:OpenStack-dev at lists.openstack.org>>
> > > >     >     >
> > > >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > >     >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     > _______________________________________________
> > > >     > OpenStack-dev mailing list
> > > >     > OpenStack-dev at lists.openstack.org
> > > >     <mailto:OpenStack-dev at lists.openstack.org>
> > > >     >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > >     >
> > > >
> > > >
> > >
> > >
> > >
> > > _______________________________________________
> > > OpenStack-dev mailing list
> > > OpenStack-dev at lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130719/9cb1258c/attachment.html>


More information about the OpenStack-dev mailing list