[openstack-dev] [oslo.messaging][zeromq] Next step

Alec Hothan (ahothan) ahothan at cisco.com
Mon Jul 20 21:04:40 UTC 2015

On 7/20/15, 5:24 AM, "Bogdan Dobrelya" <bdobrelia at mirantis.com> wrote:

>On 14.07.2015 18:59, Alec Hothan (ahothan) wrote:
>> inline...
>> On 7/8/15, 8:23 AM, "Bogdan Dobrelya" <bdobrelia at mirantis.com> wrote:
>> I believe Oleksii is already working on it.
>> On all above I believe it is best to keep oslo messaging simple and
>> predictable, then have apps deal with any retry logic as it is really
>> specific.
>Let me not agree with the simple messaging concept. Firs of all, fault
>tolerant messaging is crucial as it is a glue between all OpenStack
>components (aka micro-services). Those are distributed and sometimes
>statefull, therefore FT is a must at the 'app-layer', indeed.
>But I strongly believe that OpenStack components should *not* repeat the
>same design and code patterns to achieve FT messaging on top of the
>"simple Oslo libraries". That would just seem contrary to the very idea
>of common messaging libraries, which Oslo project is, am I right? Or
>should we instead introduce an another type of common libraries to make
>messaging FT on the app lavel?

Ideally oslo messaging would provide such service however we need to
realize that not all users need the same level of fault tolerance. For
example there are many cases where it is better to see operations fail
fast instead of discovering very late about any failure (if ever, in some
cases using rabbitMQ will completely hide certain failures because of the
nature of the pub/sub service).
Even if we convince ourself that we need a unified FT messaging service,
the reality is that the current oslo messaging API description is in dire
need of fixing and the exercise of fixing it is very tricky for the
reasons discussed earlier.
Oslo messaging was great to get OpenStack rolling but we're now watching
the limits of the current codebase and it does not look like there is a
huge interest in the community to get serious in fixing it (because there
are lots of other projects in OpenStack that are more sexy for one).

This has been a long thread and maybe we should summarize what is the long
term goal and what needs to be done short term to close the gaps:
1- fix the API documentation (a tricky exercise because of the "legacy"
code and current active developments using oslo messaging)
2- realign drivers to comply to the new API
3- realign app code to comply to the new API

I think the plan was to create a launchpad bug and gather all the comments
we had so far in a spec and review by the community under gerrit. Is there
any better suggestion?




>> Auto retries in oslo messaging can cause confusion with possible
>> duplicates which could be really bad if the messages are not idempotent.
>> I think trying to make oslo messaging a complex communication API is not
>> realistic with the few resources available.
>> It is much better to have something simple that works well (even that is
>> not easy as we can see) than something complex that has lots of issues.
>> Yes I'd like to help on that part.
>> I'm glad to see more people converging on this shortcoming and the need
>> do something.
>> As I said above, I would keep the oslo messaging API straight and simple
>> and predictable.
>> The issue with that is it may make the AMQP driver non compliant as it
>> be doing too much already but we can try to work it out.
>> We should avoid having app code having to behave differently (with
>> based on the driver or driver specific plugins) but maybe that will not
>> entirely unavoidable.
>> I'll give a short answer to all those great questions below, in the
>> we decide to go the simple API behavior route:
>> yes sender is notified about the error condition (asynchronously in the
>> case of async APIs) and as quickly as possible and the app is in charge
>> remediating to possible loss of messages (this is basically reflecting
>> tcp or zmq unicast behaves).
>> RabbitMQ would not comply because it would try to deliver the message
>> regardless without telling the sender (until at some point it may give
>> entirely and drop the message silently or it may try to resend forever)
>> there exist some use cases where the message is lost (and the sender not
>> notified).
>> ZMQ driver is simpler because it would just reflect what the ZMQ/unicast
>> library does.
>> that would be an app bug. You don't want to send a ack before the work
>> done and committed.
>> that could be a oslo messaging bug if we make sure we never allow
>> duplicates (that is never retry unless you are sure the recipient has
>> already received or make sure filtering is done properly on the
>> end to weed out duplicates).
>> For CALL: use timeout and let the app remediate to it
>> For CAST: leave the app remediate to it
>> (assuming CALL) in this case let the app handle this (in general apps
>> have to do some sort of resync with the recipients on restart if they
>> care).
>> I'll leave that to AMQP experts. At the oslo messaging layer I'd try to
>> make it behave the same as using a tcp connection (if possible).
>> For me the tricky part is the fanout case because it is not trivial to
>> implement properly in a way that scales to thousands of nodes and in a
>> that users can actually code over it properly without unexpected missing
>> messages for joining subscribers. From what I have seen this part is
>> completely overlooked today by existing fanout users (we might be lucky
>> fanout messages sort of work today but that might be problematic as we
>> scale out on larger deployments).
>> Best would be to have some working document that everybody can
>> to. I 
>> think dims was proposing to create a new launchpad bug to track and use
>> rst spec file with gerrit?
>> Thanks
>>   Alec
>Best regards,
>Bogdan Dobrelya,
>Irc #bogdando

More information about the OpenStack-dev mailing list