Open Stack

Tue Jul 1 15:14:47 UTC 2014

Hi,

Please see some minor comments inline.
Do you think we can schedule some time to discuss this topic on one of 
the upcoming meetings?
We can come out with some kind of the summary and actions plan to start 
working on.

Regards,

On 07/01/2014 05:52 PM, Ihar Hrachyshka wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 01/07/14 15:55, Alexei Kornienko wrote:
>> Hi,
>>
>> Thanks for detailed answer. Please see my comments inline.
>>
>> Regards,
>>
>> On 07/01/2014 04:28 PM, Ihar Hrachyshka wrote: On 30/06/14 21:34,
>> Alexei Kornienko wrote:
>>>>> Hello,
>>>>>
>>>>>
>>>>> My understanding is that your analysis is mostly based on
>>>>> running a profiler against the code. Network operations can
>>>>> be bottlenecked in other places.
>>>>>
>>>>> You compare 'simple script using kombu' with 'script using
>>>>> oslo.messaging'. You don't compare script using
>>>>> oslo.messaging before refactoring and 'after that. The latter
>>>>> would show whether refactoring was worth the effort. Your
>>>>> test shows that oslo.messaging performance sucks, but it's
>>>>> not definite that hotspots you've revealed, once fixed, will
>>>>> show huge boost.
>>>>>
>>>>> My concern is that it may turn out that once all the effort
>>>>> to refactor the code is done, we won't see major difference.
>>>>> So we need base numbers, and performance tests would be a
>>>>> great helper here.
>>>>>
>>>>>
>>>>> It's really sad for me to see so little faith in what I'm
>>>>> saying. The test I've done using plain kombu driver was
>>>>> needed exactly to check that network is not the bottleneck
>>>>> for messaging performance. If you don't believe in my
>>>>> performance analysis we could ask someone else to do their
>>>>> own research and provide results.
>> Technology is not about faith. :)
>>
>> First, let me make it clear I'm *not* against refactoring or
>> anything that will improve performance. I'm just a bit skeptical,
>> but hopefully you'll be able to show everyone I'm wrong, and then
>> the change will occur. :)
>>
>> To add more velocity to your effort, strong arguments should be
>> present. To facilitate that, I would start from adding performance
>> tests that would give us some basis for discussion of changes
>> proposed later.
>>> Please see below for detailed answer about performance tests
>>> implementation. It explains a bit why it's hard to present
>>> arguments that would be strong enough for you. I may run
>>> performance tests locally but it's not enough for community.
> Yes, that's why shipping some tests ready to run with oslo.messaging
> can help. Science is about reproducility, right? ;)
>
>>> And in addition I've provided some links to existing
>>> implementation with places that IMHO cause bottlenecks. From my
>>> point of view that code is doing obviously stupid things (like
>>> closing/opening sockets for each message sent).
> That indeed sounds bad.
>
>>> That is enough for me to rewrite it even without additional
>>> proofs that it's wrong.
> [Full disclosure: I'm not as involved into oslo.messaging internals as
> you probably are, so I may speak out dumb things.]
>
> I wonder whether there are easier ways to fix that particular issue
> without rewriting everything from scratch. Like, provide a pool of
> connections and make send() functions use it instead of creating new
> connections (?)
I've tried to find a way to fix that without big changes but 
unfortunately I've failed to do so.
Problem I see is that connection pool is defined and used on 1 layer of 
library and the problem is on the other.
To fix this issues we need to change several layers of code and it's 
shared between 2 drivers - rabbit, qpid.
Cause of this it seems really hard to make some logically finished and 
working patches that would allow us to move in proper direction without 
big refactoring of the drivers structure.
>
>> Then, describing proposed details in a spec will give more exposure
>> to your ideas. At the moment, I see general will to enhance the
>> library, but not enough details on how to achieve this.
>> Specification can make us think not about the burden of change that
>> obviously makes people skeptic about rewrite-all approach, but
>> about specific technical issues.
>>> I agree that we should start with a spec. However instead of
>>> having spec of needed changes I would prefer to have a spec
>>> describing needed functionality of the library (it may differ
>>> from existing functionality).
> Meaning, breaking API, again?
It's not about breaking the API it's about making it more logical and 
independent. Right now it's not clear to me what API classes are used 
and how they are used.
A lot of driver details leak outside the API and it makes it hard to 
improve driver without changing the API.
What I would like to see is a clear definition of what library should 
provide and API interface that it should implement.
It may be a little bit java like so API should be defined and frozed and 
anyone could propose their driver implementation using kombu/qpid/zeromq 
or pigeons and trained dolphins to deliver messages.

This would allow us to change drivers without touching the API and test 
their performance separately.
>
>>> Using such a spec we could decide what it needed and what needs
>>> to be removed to achieve what we need.
>>>>> Problem with refactoring that I'm planning is that it's not
>>>>> a minor refactoring that can be applied in one patch but it's
>>>>> the whole library rewritten from scratch.
>> You can still maintain a long sequence of patches, like we did when
>> we migrated neutron to oslo.messaging (it was like ~25 separate
>> pieces).
>>> Talking into account possible gate issues I would like to avoid
>>> long series of patches since they won't be able to land at the
>>> same time and rebasing will become a huge pain.
> But you're the one proposing the change, you need to take burden.
> Having a new branch for everything-rewritten version of the library
> means that each bug fix or improvement to the library will require
> being tracked by each developer in two branches, with significantly
> different code. I think it's more honest to put rebase pain on people
> who rework the code than on everyone else.
>
>>> If we decide to start working on 2.0 API/implementation I think a
>>> topic branch 2.0a would be much better.
> I respectfully disagree. See above.
>
>>>>> Existing messaging code was written long long time ago (in a
>>>>> galaxy far far away maybe?) and it was copy-pasted directly
>>>>> from nova. It was not built as a library and it was never
>>>>> intended to be used outside of nova. Some parts of it cannot
>>>>> even work normally cause it was not designed to work with
>>>>> drivers like zeromq (matchmaker stuff).
>> oslo.messaging is NOT the code you can find in oslo-incubator rpc
>> module. It was hugely rewritten to expose a new, cleaner API. This
>> is btw one of the reasons migration to this new library is so
>> painful. It was painful to move to oslo.messaging, so we need clear
>> need for a change before switching to yet another library.
>>> API indeed has changed but general implementation details and
>>> processing flow goes way back to 2011 and nova code (for example
>>> general Publisher/Consumer implementation in impl_rabbit) That's
>>> the code I'm talking about.
> Roger.
>
>>> Refactoring as I see it will do the opposite thing. It will keep
>>> intact as much API as possible but change internals to make it
>>> more efficient (that's why I call it refactoring) So 2.0 version
>>> might be (partially?) backwards compatible and migration won't be
>>> such a pain.
> That sounds promising. Though see my concern on your suggestion to
> revisit the scope of the library above.
>
>>>>> The reason I've raised this question on the mailing list was
>>>>> to get some agreement about future plans of oslo.messaging
>>>>> development and start working on it in coordination with
>>>>> community. For now I don't see any actions plan emerging from
>>>>> it. I would like to see us bringing more constructive ideas
>>>>> about what should be done.
>>>>>
>>>>> If you think that first action should be profiling lets
>>>>> discuss how it should be implemented (cause it works for me
>>>>> just fine on my local PC). I guess we'll need to define some
>>>>> basic scenarios that would show us overall performance of the
>>>>> library.
>> Let's start from basic send/receive throughput, for tiny and large
>> messages, multiple consumers etc.
>>> This would be a great start but it's quite hard to test basic
>>> send/receive since existing code is written around rpc. I don't
>>> see a way to send a message without complex rpc code being
>>> involved. That's why I propose to start refactoring that would
>>> separate rpc code from basic messaging code.
> Again, removing RPC code from your tests won't mean the library as a
> whole will get higher performance. That said, refactoring that would
> result in clear separation of layers can be beneficial even without
> major performance boost. But that means that we probably should not
> put performance concerns as the main reason for rework. I would set
> 'clean code' as the primary goal.
Yes we can consider clean code as a primary goal. In the same time I 
hope that we'll get 2 goals at once since clean code will also work faster.
>
>>>>> There are a lot of questions that should be answered to
>>>>> implement this: Where such tests would run (jenking, local
>>>>> PC, devstack VM)?
>> I would expect it to be exposed to jenkins thru 'tox'. We then can
>> set up a separate job to run them and compare with a base line
>> [TBD: what *is* baseline?] to make sure we don't introduce
>> performance regressions.
>>> Such tests cannot be exposed thru 'tox' since they require some
>>> environment setup (rabbitmq-server, zeromq matchmaker, etc.).
>>> Such setup is way out of scope for tox. Cause of this we should
>>> find some other way to run such tests.
> You may just assume server is already set and available thru a common
> socket.
Assuming that something is already setup is not an option. If we add new 
env to tox I assume that anyone can run it locally and get the same 
results as he would get from jenkins.
>
>>>>> How such scenarios should look like? How do we measure
>>>>> performance (cProfile, etc.)?
>> I think we're interested in message rate, not CPU utilization.
>>> Problem here that it's hard to find bottleneck in message rate
>>> without deeper analysis (cpu utilization, etc.)
> Tests are not to show spots, they can be used to avoid performance
> regressions or support claims about alleged performance gains added by
> a patch (or refactoring).
>
>>>>> How do we collect results? How do we analyze results to find
>>>>> bottlenecks? etc.
>>>>>
>>>>> Another option would be to spend some of my free time
>>>>> implementing mentioned refactoring (as I see it) and show you
>>>>> the results of performance testing compared with existing
>>>>> code.
>> This approach generally doesn't work beyond PoC. Openstack is a
>> complex project, and we need to stick to procedures - spec review,
>> then coding, all in upstream, with no private branches outside
>> common infrastructure.
>>> I agree with such approach but it also has many drawbacks. I
>>> don't know a clean way to communicate design drafts and
>>> implementation details without actually writing the code.
> You may still have a PoC. You just should not consider it as a final
> code, it's there to support the spec case.
>
>>> And if you already have a working code all this spec review
>>> becomes quite a useless burden.
> It adds context to your spec. And be prepared that lots of code you
> write before or after doing spec job *will* be rewritten. :)
>
>>> If you know a way how to solve this problem (creating high-mid
>>> level architecture design) please share it with me so we could
>>> use it.
>>>>> The only problem with such approach is that my code won't be
>>>>> oslo.messaging and it won't be accepted by community. It may
>>>>> be drop in base for v2.0 but I'm afraid this won't be
>>>>> acceptable either.
>>>>>
>> Future does not occur here that way. If you want your work to be
>> consumed by community, you need to work with it.
>>> That's what I'm trying to do :)
> OK. BTW you can also join Oslo team at #openstack-oslo to discuss your
> case and whatnot.
>
> Cheers,
> /Ihar
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBCgAGBQJTssslAAoJEC5aWaUY1u57v3kH/iqLISfDdmtI8bBz9PcMw16P
> /aL6ufUyz6bPZVj+sTcjaPZznhcSLaWzDQVk5fSam1yr0yTAs66AG70gkWWcisFY
> EY5xwTyXzMeufDfWATsyXGxeZCUZhwIxjKas2UXhnErT2sd7DRtSuQXwDZfmn36V
> Q6YsQiwXOZAAEmnadF6w7Bgq2BBI9Pt6p+BN9syj32fvGNLBZuKo8hz1uWyXB14k
> m5blNqYVIeMMynTWUXgT7lH0poVtHpBs8hcoKRXGlAuyc5OtX1Dkq+cTIhAO6Tnj
> dFK0D1R/g1fAaVuojw12vqEWRUKL1AK1lyrQVKlX9PgU3pDlfch0WxfcFIqWolY=
> =YE5H
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [oslo][messaging] Further improvements and refactoring

OpenStack

Community

Documentation

Branding & Legal