[openstack-dev] [neutron][networking-vpp]Introducing networking-vpp

Neil Jerram neil at tigera.io
Wed Oct 19 08:50:21 UTC 2016


On Wed, Oct 19, 2016 at 4:30 AM Ian Wells <ijw.ubuntu at cack.org.uk> wrote:

> Sorry to waken an old thread, but I chose a perfect moment to go on
> holiday...
>
> So yes: I don't entirely trust the way we use RabbitMQ, and that's largely
> because what we're doing with it - distributing state, or copies of state,
> or information derived from state - leads to some fragility and odd
> situations when using a tool perhaps better suited to listing off tasks.
> We've tried to find a different model of working that is closer to the
> behaviour we're after.  It is, I believe, similar to the Calico team's
> thinking, but not derived from their code.  I have to admit at this point
> that it's not been tested at scale in our use of it, and that's something
> we will be doing, but I can say that this is working in a way that is in
> line with how etcd is intended to be used, we have tested representative
> etcd performance, and we don't expect problems.
>
> As mentioned before, Neutron's SQL database is the source of truth - you
> need to have one, and that one represents what the client asked for in its
> purest form.  In the nature of keeping two datastores in sync, there is a
> worker thread outside of the REST call to do the synchronisation (because
> we don't want the cloud user to be waiting on our internal workings, and
> because consistently committing to two databases is a recipe for disaster)
> - etcd lags the Neutron DB commits very slightly, and the Neutron DB is
> always right.  This allows the API to be quick while the backend will run
> as efficiently as possible.
>
> It does also mean that failures to communicate in the backend don't result
> in failed API calls - the call succeeds but state updates don't happen.
> This is in line with a 'desired state' model.  A user tells Neutron what
> they want to do and Neutron should generally accept the request if it's
> well formatted and consistent.  Exceptional error codes like 500s are
> annoying to deal with, as you never know if that means 'I failed to save
> that' or 'I failed to implement that' or 'I saved and implemented that, but
> didn't quite get the answer to you' - having simple frontend code ensures
> the answer is highly likely to be 'I will do that it in a moment', in
> keeping with with the eventually consistent model OpenStack has.  The
> driver will then work its magic and update object states when the work is
> finally complete.
>
> Watching changes - and the pub-sub model you end up with - is a means of
> being efficient, but should we miss notifications there's a fallback
> mechanism to get back into state sync with the most recent version of the
> state.  In the worst case, we focus on the currently desired state, and not
> the backlog of recent changes to state.
>
> And Jay, you're right.  What we should be comparing here is how well it
> works.  Is it easy to use, is it easy to maintain, is it annoyingly
> fragile, and does it eat network or CPU?  I believe so (or I wouldn't have
> chosen to do it this way), and I hope we've produced something simple to
> understand while being easier to operate.  However, the proof of the
> pudding is in the eating, so let's see how this works as we continue to
> develop and test it.
>
>
Full ack.  This is indeed "similar to the Calico team's thinking", but
you've done a beautiful job of expressing it.

     Neil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20161019/2fb11846/attachment.html>


More information about the OpenStack-dev mailing list