[openstack-dev] [nova] readout from Philly Operators Meetup

Sean Dague sean at dague.net
Thu Mar 12 16:59:35 UTC 2015

On 03/12/2015 12:47 PM, Clint Byrum wrote:
> Excerpts from Sean Dague's message of 2015-03-11 05:59:10 -0700:
>> =============================
>>  Additional Interesting Bits
>> =============================
>> Rabbit
>> ------
>> There was a whole session on Rabbit -
>> https://etherpad.openstack.org/p/PHL-ops-rabbit-queue
>> Rabbit is a top operational concern for most large sites. Almost all
>> sites have a "restart everything that talks to rabbit" script because
>> during rabbit ha opperations queues tend to blackhole.
>> All other queue systems OpenStack supports are worse than Rabbit (from
>> experience in that room).
>> oslo.messaging < 1.6.0 was a significant regression in dependability
>> from the incubator code. It now seems to be getting better but still a
>> lot of issues. (L112)
>> Operators *really* want the concept in
>> https://review.openstack.org/#/c/146047/ landed. (I asked them to
>> provide such feedback in gerrit).
> This reminded me that there are other options that need investigation.
> A few of us have been looking at what it might take to use something
> in between RabbitMQ and ZeroMQ for RPC and notifications. Some initial
> forays into inspecting Gearman (which infra has successfully used for
> quite some time as the backend of Zuul) look promising. A few notes:
> * The Gearman protocol is crazy simple. There are currently 4 known gearman
>   server implementations: Perl, Java, C, and Python (written and
>   maintained by our own infra team). http://gearman.org/download/ for
>   the others, and https://pypi.python.org/pypi/gear for the python one.
> * Gearman has no pub/sub capability built in for 1:N comms. However, it
>   is fairly straight forward to write workers that will rebroadcast
>   messages to subscribers.
> * Gearman's security model is not very rich. Mostly, if you have been
>   authenticated to the gearman server (only the C server actually even
>   supports any type of authentication, via SSL client certs), you can
>   do whatever you want including consuming all the messages in a queue
>   or filling up a queue with nonsense. This has been raised as a concern
>   in the past and might warrant extra work to add support to the python
>   server and/or add ACL support.
> Part of our motivation for this is that some of us are going to be
> deploying a cloud soon and none of us are excited about deploying and
> supporting RabbitMQ. So we may be proposing specs to add Gearman as a
> deployment option soon.

I think experimentation of other models is good. There was some
conversation that maybe Kafka was a better model as well. However,
realize that services are quite chatty at this point and push pretty
large payloads through that bus. The HA story is also quite important,
because the underlying message architecture assumes reliable delivery
for some of the messages, and if they fall on the floor, you'll get
either leaked resources, or broken resources. It's actually the HA
recovery piece of Rabbit (and when it doesn't HA recover correctly)
that's seemingly the sharp edge most people are hitting.

So... experimentation is good, but also important to realize how much is
provided for by the infrastructure that's there.


Sean Dague

More information about the OpenStack-dev mailing list