[openstack-dev] [oslo.messaging][zeromq] Next step

Alec Hothan (ahothan) ahothan at cisco.com
Fri Jun 12 20:41:17 UTC 2015



On 6/1/15, 5:03 PM, "Davanum Srinivas" <davanum at gmail.com> wrote:

>fyi, the spec for zeromq driver in oslo.messaging is here:
>https://review.openstack.org/#/c/187338/1/specs/liberty/zmq-patterns-usage
>.rst,unified
>
>-- dims

I was about to provide some email comments on the above review off gerrit,
but figured maybe it would be good to make a quick status of the state of
this general effort for pushing out a better zmq driver for oslo essaging.
So I started to look around the oslo/zeromq wiki and saw few email threads
that drew my interest.

In this email (Nov 2014) Ilya proposes about getting rid of a central
broker for zmq:
http://lists.openstack.org/pipermail/openstack-dev/2014-November/050701.htm
l
Not clear if Ilya already had in mind to instead have a local proxy on
every node (as proposed in the above spec)


In this email (mar 2014), Yatin described the prospect of using zmq in a
completely broker-less way (so not even a proxy per node), with the use of
matchmaker rings to configure well known ports.
http://lists.openstack.org/pipermail/openstack-dev/2014-March/030411.html
Which is pretty close to what I think would be a better design (with the
variant that I'd rather see a robust and highly available name server
instead of fixed port assignments), I'd be interested to know what
happened to that proposal and why we ended up with a proxy per node
solution at this stage (I'll reply to the proxy per node design in a
separate email to complement my gerrit comments).


I could not find one document that summarizes the list of issues related
to rabbitMQ deployments, all it appears is that many people are unhappy
with it, some are willing to switch to zmq, many are hesitant and some are
decidedly skeptical. On my side I know a number of issues related to oslo
messaging over rabbitMQ.

I think it is important for the community to understand that of the many
issues generally attributed to oslo messaging over rabbitMQ, not all of
them are caused by the choice of rabbitMQ as a transport (and hence those
will likely not be fixed if we just switched from rabbitMQ to ZMQ) and
many are actually caused by the misuse of oslo messaging by the apps
(Neutron, Nova...) and can only be fixed by modification of the app code.

I think personally that there is a strong case for a properly designed ZMQ
driver but we first need to make the expectations very clear.

One long standing issue I can see is the fact that the oslo messaging API
documentation is sorely lacking details on critical areas such as API
behavior during fault conditions, load conditions and scale conditions.
As a result, app developers are using the APIs sometimes indiscriminately
and that will have an impact on the overall quality of openstack in
deployment conditions.
I understand that a lot of the existing code was written in a hurry and
good enough to work properly on small setups, but some code will break
really badly under load or when things start to go south in the cloud.
That is unless the community realizes that perhaps there is something that
needs to be done.

We're only starting to see today things breaking under load because we
have more lab tests at scale, more deployments at scale and we only start
to see real system level testing at scale with HA testing (the kind of
test where you inject load and cause failures of all sorts). Today we know
that openstack behaves terribly in these conditions, even in so-called HA
deployments!

As a first step, would it be useful to have one single official document
that characterizes all the issues we're trying to fix and perhaps used
that document as a basis for showing which of all these issues will be
fixed by the use of the zmq driver? I think that could help us focus
better on the type of requirements we need from this new ZMQ driver.


Thanks,

  Alec






More information about the OpenStack-dev mailing list