[openstack-dev] [oslo.messaging][zeromq] Next step

Bogdan Dobrelya bdobrelia at mirantis.com
Mon Jul 20 12:24:18 UTC 2015


inline

On 14.07.2015 18:59, Alec Hothan (ahothan) wrote:
> inline...
> 
> 
> On 7/8/15, 8:23 AM, "Bogdan Dobrelya" <bdobrelia at mirantis.com> wrote:
> 
> 
> I believe Oleksii is already working on it.
> 
> 
> 
> On all above I believe it is best to keep oslo messaging simple and
> predictable, then have apps deal with any retry logic as it is really app
> specific.

Let me not agree with the simple messaging concept. Firs of all, fault
tolerant messaging is crucial as it is a glue between all OpenStack
components (aka micro-services). Those are distributed and sometimes
statefull, therefore FT is a must at the 'app-layer', indeed.

But I strongly believe that OpenStack components should *not* repeat the
same design and code patterns to achieve FT messaging on top of the
"simple Oslo libraries". That would just seem contrary to the very idea
of common messaging libraries, which Oslo project is, am I right? Or
should we instead introduce an another type of common libraries to make
messaging FT on the app lavel?


> Auto retries in oslo messaging can cause confusion with possible
> duplicates which could be really bad if the messages are not idempotent.
> I think trying to make oslo messaging a complex communication API is not
> realistic with the few resources available.
> It is much better to have something simple that works well (even that is
> not easy as we can see) than something complex that has lots of issues.
> 
> 
> 
> Yes I'd like to help on that part.
> 
> 
> I'm glad to see more people converging on this shortcoming and the need to
> do something.
> 
> As I said above, I would keep the oslo messaging API straight and simple
> and predictable.
> The issue with that is it may make the AMQP driver non compliant as it may
> be doing too much already but we can try to work it out.
> We should avoid having app code having to behave differently (with if/else
> based on the driver or driver specific plugins) but maybe that will not be
> entirely unavoidable.
> 
> I'll give a short answer to all those great questions below, in the event
> we decide to go the simple API behavior route:
> 
> 
> yes sender is notified about the error condition (asynchronously in the
> case of async APIs) and as quickly as possible and the app is in charge of
> remediating to possible loss of messages (this is basically reflecting how
> tcp or zmq unicast behaves).
> 
> RabbitMQ would not comply because it would try to deliver the message
> regardless without telling the sender (until at some point it may give up
> entirely and drop the message silently or it may try to resend forever) or
> there exist some use cases where the message is lost (and the sender not
> notified).
> ZMQ driver is simpler because it would just reflect what the ZMQ/unicast
> library does.
> 
> 
> 
> that would be an app bug. You don't want to send a ack before the work is
> done and committed.
> 
> 
> 
> that could be a oslo messaging bug if we make sure we never allow
> duplicates (that is never retry unless you are sure the recipient has not
> already received or make sure filtering is done properly on the receiving
> end to weed out duplicates).
> 
> 
> 
> For CALL: use timeout and let the app remediate to it
> For CAST: leave the app remediate to it
> 
> 
> 
> (assuming CALL) in this case let the app handle this (in general apps will
> have to do some sort of resync with the recipients on restart if they
> care).
> 
> 
> I'll leave that to AMQP experts. At the oslo messaging layer I'd try to
> make it behave the same as using a tcp connection (if possible).
> 
> 
> 
> For me the tricky part is the fanout case because it is not trivial to
> implement properly in a way that scales to thousands of nodes and in a way
> that users can actually code over it properly without unexpected missing
> messages for joining subscribers. From what I have seen this part is
> completely overlooked today by existing fanout users (we might be lucky
> fanout messages sort of work today but that might be problematic as we
> scale out on larger deployments).
> 
> 
> 
> Best would be to have some working document that everybody can contribute
> to. I 
> think dims was proposing to create a new launchpad bug to track and use an
> rst spec file with gerrit?
> 
> Thanks
> 
>   Alec
> 
> 
> 
> 
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list