[openstack-dev] [oslo.messaging][zeromq] Next step

Clint Byrum clint at fewbar.com
Fri Jun 12 22:55:52 UTC 2015

Excerpts from Alec Hothan (ahothan)'s message of 2015-06-12 13:41:17 -0700:
> On 6/1/15, 5:03 PM, "Davanum Srinivas" <davanum at gmail.com> wrote:
> >fyi, the spec for zeromq driver in oslo.messaging is here:
> >https://review.openstack.org/#/c/187338/1/specs/liberty/zmq-patterns-usage
> >.rst,unified
> >
> >-- dims
> I was about to provide some email comments on the above review off gerrit,
> but figured maybe it would be good to make a quick status of the state of
> this general effort for pushing out a better zmq driver for oslo essaging.
> So I started to look around the oslo/zeromq wiki and saw few email threads
> that drew my interest.
> In this email (Nov 2014) Ilya proposes about getting rid of a central
> broker for zmq:
> http://lists.openstack.org/pipermail/openstack-dev/2014-November/050701.htm
> l
> Not clear if Ilya already had in mind to instead have a local proxy on
> every node (as proposed in the above spec)
> In this email (mar 2014), Yatin described the prospect of using zmq in a
> completely broker-less way (so not even a proxy per node), with the use of
> matchmaker rings to configure well known ports.
> http://lists.openstack.org/pipermail/openstack-dev/2014-March/030411.html
> Which is pretty close to what I think would be a better design (with the
> variant that I'd rather see a robust and highly available name server
> instead of fixed port assignments), I'd be interested to know what
> happened to that proposal and why we ended up with a proxy per node
> solution at this stage (I'll reply to the proxy per node design in a
> separate email to complement my gerrit comments).
> I could not find one document that summarizes the list of issues related
> to rabbitMQ deployments, all it appears is that many people are unhappy
> with it, some are willing to switch to zmq, many are hesitant and some are
> decidedly skeptical. On my side I know a number of issues related to oslo
> messaging over rabbitMQ.
> I think it is important for the community to understand that of the many
> issues generally attributed to oslo messaging over rabbitMQ, not all of
> them are caused by the choice of rabbitMQ as a transport (and hence those
> will likely not be fixed if we just switched from rabbitMQ to ZMQ) and
> many are actually caused by the misuse of oslo messaging by the apps
> (Neutron, Nova...) and can only be fixed by modification of the app code.
> I think personally that there is a strong case for a properly designed ZMQ
> driver but we first need to make the expectations very clear.
> One long standing issue I can see is the fact that the oslo messaging API
> documentation is sorely lacking details on critical areas such as API
> behavior during fault conditions, load conditions and scale conditions.
> As a result, app developers are using the APIs sometimes indiscriminately
> and that will have an impact on the overall quality of openstack in
> deployment conditions.
> I understand that a lot of the existing code was written in a hurry and
> good enough to work properly on small setups, but some code will break
> really badly under load or when things start to go south in the cloud.
> That is unless the community realizes that perhaps there is something that
> needs to be done.
> We're only starting to see today things breaking under load because we
> have more lab tests at scale, more deployments at scale and we only start
> to see real system level testing at scale with HA testing (the kind of
> test where you inject load and cause failures of all sorts). Today we know
> that openstack behaves terribly in these conditions, even in so-called HA
> deployments!
> As a first step, would it be useful to have one single official document
> that characterizes all the issues we're trying to fix and perhaps used
> that document as a basis for showing which of all these issues will be
> fixed by the use of the zmq driver? I think that could help us focus
> better on the type of requirements we need from this new ZMQ driver.

I think you missed "it is not tested in the gate" as a root cause for
some of the ambiguity. Anecdotes and bug reports are super important for
knowing where to invest next, but a test suite would at least establish a
base line and prevent the sort of thrashing and confusion that comes from
such a diverse community of users feeding bug reports into the system.

Also, not having a test in the gate is a serious infraction now, and will
lead to zmq's removal from oslo.messaging now that we have a ratified
policy requiring this. I suggest a first step being to strive to get a
devstack-gate job that runs using zmq instead of rabbitmq. You can
trigger it in oslo.messaging's check pipeline, and make it non-voting,
but eventually it needs to get into nova, neutron, cinder, heat, etc.
etc. Without that, you'll find that the community of potential
benefactors of any effort you put into zmq will shrink dramatically when
we are forced to remove the driver from oslo.messaging (it can of course
live on out of tree).

More information about the OpenStack-dev mailing list