[openstack-dev] [neutron][L3] IPAM alternate refactoring
Kevin Benton
blak111 at gmail.com
Mon Apr 13 23:31:51 UTC 2015
>The thing is, is that you *should* be able to call core_plugin.create_port
in a transaction.
Well it depends on what you mean by that. If you mean create_port should be
part of the same transaction, I disagree because it leads to either
inconsistency or a loss of veto power for drivers with external backends.
With the current code, if you enclose create_port in a transaction and then
have a failure in the parent transaction after the port is created, the DB
creation will be rolled back but nothing will inform the backend to release
the resources it allocated for the port.
If we switch to a notification system like you described where
notifications are deferred until after create_port is complete, we just end
up removing the ability for backends to block a create_port call if
necessary. That's a pretty significant change because callers will think
they have successfully created a port when not all of the relevant systems
have confirmed it.
This is going to become even more pronounced if procedures to allocate IP
addresses and whatever else for the port result in calls to external
servers.
In the hack you showed, wouldn't it be easier to just to have a way to
register extra DB operations to be performed on port_create? Something like
a run-time defined mechanism driver with only a create port pre-commit
method.
>We're still left with questions such as: What happens if I commit a
mega-transaction and then all (Or even more complicated, one) of the
notifications fails, but this isn't a new problem.
This is why I think we shouldn't just rely on the DB to make
mega-transactions. It doesn't really work with us calling out to other
systems. We need a more generic system to manage flows of tasks that each
have rollback mechanisms so the semantics rolling back large operations are
handled in a database independent manner. If only such a system existed. ;-)
On Mon, Apr 13, 2015 at 3:50 PM, Assaf Muller <amuller at redhat.com> wrote:
>
>
> ----- Original Message -----
> > I think removing all occurrences of create_port inside of another
> transaction
> > is something we should be doing for a couple of reasons.
>
> The issues you're pointing out are very much real. It's a *huge* pain to
> workaround
> this issue and you can look for an example here:
>
> https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L303
>
> The thing is, is that you *should* be able to call core_plugin.create_port
> in a
> transaction. I think that the correct thing to do is to eliminate the
> issue with
> create_port, and not work around the issue with awful patterns such as the
> one
> in the link above. There's a few different acute issues with that pattern:
> 1) We have no automated way to tell if create_port is being called in a
> transaction
> or not, currently it's left up to reviewers to spot such occurrences
> and prevent
> them from being merged.
> 2) The mental load it adds to read that code is not trivial.
> 3) Transactions are awesome... I'd very much like to group up
> core_plugin.create_port
> and create_ha_port_binding in a single transaction and avoid having to
> deal with
> edge cases manually.
> 4) Sometimes you can't use the try/except/manual cleanup approach (If you
> delete a resource
> in transaction A, then transaction B fails, good luck re-creating the
> resource you already
> deleted).
>
> The better long term approach would be to introduce a framework at the API
> layer that queues
> up notifications (Both HTTP to vendor servers and RPC to agents) at the
> start of an API or RPC call.
> You're then free to use a single huge transaction (Fun!), and finally all
> queued up notifications
> will be sent for you automagically. That's the simplest approach, I
> haven't thought this through
> and I'm sure there will be issues but it should be possible. We're still
> left with questions such
> as: What happens if I commit a mega-transaction and then all (Or even more
> complicated, one) of
> the notifications fails, but this isn't a new problem.
>
> >
> > First, it's a recipe for the cherished "lock wait timeout" deadlocks
> because
> > create_port makes yielding calls. These are awful to troubleshoot and are
> > pretty annoying for users (request takes ~60 seconds and then blows up).
> >
> > Second, create_port in ML2 expects the transaction to be committed to
> the DB
> > by the time it's done with pre-commit phase, which we break by opening a
> > parent transaction before calling it so the failure handling semantics
> may
> > be messed up.
> >
> >
> >
> > On Mon, Apr 13, 2015 at 9:48 AM, Carl Baldwin < carl at ecbaldwin.net >
> wrote:
> >
> >
> > Have we found the last of them? I wonder. I suppose any higher level
> > service like a router that needs to create ports under the hood (under
> > the API) will have this problem. The DVR fip namespace creation comes
> > to mind. It will create a port to use as the external gateway port
> > for that namespace. This could spring up in the context of another
> > create_port, I think (VM gets new port bound to a compute host where a
> > fip namespace needs to spring in to existence).
> >
> > Carl
> >
> > On Mon, Apr 13, 2015 at 10:24 AM, John Belamaric
> > < jbelamaric at infoblox.com > wrote:
> > > Thanks Pavel. I see an additional case in L3_NAT_dbonly_mixin, where it
> > > starts the transaction in create_router, then eventually gets to
> > > create_port:
> > >
> > > create_router (starts tx)
> > > ->self._update_router_gw_info
> > > ->_create_gw_port
> > > ->_create_router_gw_port
> > > ->create_port(plugin)
> > >
> > > So that also would need to be unwound.
> > >
> > > On 4/13/15, 10:44 AM, "Pavel Bondar" < pbondar at infoblox.com > wrote:
> > >
> > >>Hi,
> > >>
> > >>I made some investigation on the topic[1] and see several issues on
> this
> > >>way.
> > >>
> > >>1. Plugin's create_port() is wrapped up in top level transaction for
> > >>create floating ip case[2], so it becomes more complicated to do IPAM
> > >>calls outside main db transaction.
> > >>
> > >>- for create floating ip case transaction is initialized on
> > >>create_floatingip level:
> > >>create_floatingip(l3_db)->create_port(plugin)->create_port(db_base)
> > >>So IPAM call should be added into create_floatingip to be outside db
> > >>transaction
> > >>
> > >>- for usual port create transaction is initialized on plugin's
> > >>create_port level, and John's change[1] cover this case:
> > >>create_port(plugin)->create_port(db_base)
> > >>
> > >>Create floating ip work-flow involves calling plugin's create_port,
> > >>so IPAM code inside of it should be executed only when it is not
> wrapped
> > >>into top level transaction.
> > >>
> > >>2. It is opened question about error handling.
> > >>Should we use taskflow to manage IPAM calls to external systems?
> > >>Or simple exception based model is enough to handle rollback actions on
> > >>third party systems in case of failing main db transaction.
> > >>
> > >>[1] https://review.openstack.org/#/c/172443/
> > >>[2] neutron/db/l3_db.py: line 905
> > >>
> > >>Thanks,
> > >>Pavel
> > >>
> > >>On 10.04.2015 21:04, openstack-dev-request at lists.openstack.org wrote:
> > >>> L3 Team,
> > >>>
> > >>> I have put up a WIP [1] that provides an approach that shows the ML2
> > >>>create_port method refactored to use the IPAM driver prior to
> initiating
> > >>>the database transaction. Details are in the commit message - this is
> > >>>really just intended to provide a strawman for discussion of the
> > >>>options. The actual refactor here is only about 40 lines of code.
> > >>>
> > >>> [1] https://review.openstack.org/#/c/172443/
> > >>>
> > >>>
> > >>> Thanks,
> > >>> John
> > >>
> > >>
> >
> >>__________________________________________________________________________
> > >>OpenStack Development Mailing List (not for usage questions)
> > >>Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> > >
> > >
> __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > --
> > Kevin Benton
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Kevin Benton
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150413/0c95a0b8/attachment.html>
More information about the OpenStack-dev
mailing list