[openstack-dev] [oslo][monasca] Can we uncap python-kafka ?

Keen, Joe joe.keen at hpe.com
Mon Dec 5 04:03:13 UTC 2016



On 12/4/16, 7:36 PM, "Tony Breeds" <tony at bakeyournoodle.com> wrote:

>On Fri, Dec 02, 2016 at 06:18:39PM +0000, Keen, Joe wrote:
>> 
>> 
>> On 12/2/16, 1:29 AM, "Mehdi Abaakouk" <sileht at sileht.net> wrote:
>> 
>> >On Fri, Dec 02, 2016 at 03:29:59PM +1100, Tony Breeds wrote:
>> >>On Thu, Dec 01, 2016 at 04:52:52PM +0000, Keen, Joe wrote:
>> >>
>> >>> Unfortunately there¹s nothing wrong on the Monasca side so far as we
>> >>>know.
>> >>>  We test new versions of the kafka-python library outside of Monasca
>> >>> before we bother to try integrating a new version.  Since 1.0 the
>> >>> kafka-python library has suffered from crashes and memory leaks
>>severe
>> >>> enough that we¹ve never attempted using it in Monasca itself.  We
>> >>>reported
>> >>> the bugs we found to the kafka-python project but they were closed
>>once
>> >>> they released a new version.
>> >>
>> >>So Opening bugs isn't working.  What about writing code?
>> >
>> >The bug https://github.com/dpkp/kafka-python/issues/55
>> >
>> >Reopening it would be the right solution here.
>> >
>> >I can't reproduce the segfault neither and I agree with dpkp, that
>>looks
>> >like a
>> >ujson issue.
>> 
>> 
>> The bug I had was: https://github.com/dpkp/kafka-python/issues/551
>> 
>> In the case of that bug ujson was not an issue.  The behaviour remained
>> even using the standard json library.  The primary issue I found with it
>> was a memory leak over successive runs of the test script.  Eventually
>>the
>> leak became so bad that the OOM killer killed the process which caused
>>the
>> segfault I was seeing.  The last version I tested was 1.2.1 and it still
>> leaked badly.  I¹ll need to let the benchmark script run for a while and
>> make sure it¹s not still leaking.
>
>So you write that on Friday so you shoudl know by now if it's leaking
>care to
>give us an update?

I wasn’t able to set a test up on Friday and with all the other work I
have for the next few days I doubt I’ll be able to get to it much before
Wednesday.

>
>> >And my bench seems to confirm the perf issue have been solved:
>> >(but not in the pointed version...)
>> >
>> >$ pifpaf run kafka python kafka_test.py
>> >kafka-python version: 0.9.5
>> >...
>> >fetch size 179200 -> 45681.8728864 messages per second
>> >fetch size 204800 -> 47724.3810674 messages per second
>> >fetch size 230400 -> 47209.9841092 messages per second
>> >fetch size 256000 -> 48340.7719787 messages per second
>> >fetch size 281600 -> 49192.9896743 messages per second
>> >fetch size 307200 -> 50915.3291133 messages per second
>> >
>> >$ pifpaf run kafka python kafka_test.py
>> >kafka-python version: 1.0.2
>> >....
>> >fetch size 179200 -> 8546.77931323 messages per second
>> >fetch size 204800 -> 9213.30958314 messages per second
>> >fetch size 230400 -> 10316.668006 messages per second
>> >fetch size 256000 -> 11476.2285269 messages per second
>> >fetch size 281600 -> 12353.7254386 messages per second
>> >fetch size 307200 -> 13131.2367288 messages per second
>> >
>> >(1.1.1 and 1.2.5 have also the same issue)
>> >
>> >$ pifpaf run kafka python kafka_test.py
>> >kafka-python version: 1.3.1
>> >fetch size 179200 -> 44636.9371873 messages per second
>> >fetch size 204800 -> 44324.7085365 messages per second
>> >fetch size 230400 -> 45235.8283208 messages per second
>> >fetch size 256000 -> 45793.1044121 messages per second
>> >fetch size 281600 -> 44648.6357019 messages per second
>> >fetch size 307200 -> 44877.8445987 messages per second
>> >fetch size 332800 -> 47166.9176281 messages per second
>> >fetch size 358400 -> 47391.0057622 messages per second
>> >
>> >Looks like it works well now :)
>> 
>> It¹s good that the performance problem has been fixed.  The remaining
>> issues on the Monasca side are verifying that the batch send method we
>> were using in 0.9.5 still works with the new async behaviour, seeing if
>> our consumer auto balance still functions or converting to use the Kafka
>> internal auto balance in Kafka 0.10, and finding a way to do efficient
>> synchronous writes with the new async methods.
>
>Can you +1 https://review.openstack.org/404878 to make it clear we can
>proceed
>down this path.

I don’t know, yet, that we can.  Unless we can find an answer to the
questions I had above I’m not sure that this new library will be
performant and durable enough for the use cases Monasca has.  I’m fairly
confident that we can make it work but the performance issues with
previous versions prevented us from even trying to integrate so it will
take us some time.  If you need an answer more quickly than a week or so,
and if anyone in the community is willing, I can walk them through the
testing I’d expect to happen to validate the new library.



More information about the OpenStack-dev mailing list