[openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
Ihar Hrachyshka
ihrachys at redhat.com
Tue Jul 15 22:30:37 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
On 14/07/14 22:48, Vishvananda Ishaya wrote:
>
> On Jul 13, 2014, at 9:29 AM, Ihar Hrachyshka <ihrachys at redhat.com>
> wrote:
>
>> Signed PGP part On 12/07/14 03:17, Mike Bayer wrote:
>>>
>>> On 7/11/14, 7:26 PM, Carl Baldwin wrote:
>>>>
>>>>
>>>> On Jul 11, 2014 5:32 PM, "Vishvananda Ishaya"
>>>> <vishvananda at gmail.com
>>> <mailto:vishvananda at gmail.com>> wrote:
>>>>>
>>>>> I have tried using pymysql in place of mysqldb and in real
>>>>> world
>>> concurrency
>>>>> tests against cinder and nova it performs slower. I was
>>>>> inspired by
>>> the mention
>>>>> of mysql-connector so I just tried that option instead.
>>> Mysql-connector seems
>>>>> to be slightly slower as well, which leads me to believe
>>>>> that the
>>> blocking inside of
>>>>
>>>> Do you have some numbers? "Seems to be slightly slower"
>>>> doesn't
>>> really stand up as an argument against the numbers that have
>>> been posted in this thread.
>
> Numbers are highly dependent on a number of other factors, but I
> was seeing 100 concurrent list commands against cinder going from
> an average of 400 ms to an average of around 600 ms with both
> msql-connector and pymsql.
I've made my tests on neutron only, so there is possibility that
cinder works somehow differently.
But, those numbers don't tell a lot in terms of considering the
switch. Do you have numbers for mysqldb case?
>
> It is also worth mentioning that my test of 100 concurrent creates
> from the same project in cinder leads to average response times
> over 3 seconds. Note that creates return before the request is sent
> to the node for processing, so this is just the api creating the db
> record and sticking a message on the queue. A huge part of the
> slowdown is in quota reservation processing which does a row lock
> on the project id.
Again, are those 3 seconds better or worse than what we have for mysqldb?
>
> Before we are sure that an eventlet friendly backend “gets rid of
> all deadlocks”, I will mention that trying this test against
> connector leads to some requests timing out at our load balancer (5
> minute timeout), so we may actually be introducing deadlocks where
> the retry_on_deadlock operator is used.
Deadlocks != timeouts. I attempt to fix eventlet-triggered db
deadlocks, not all possible deadlocks that you may envision, or timeouts.
>
> Consider the above anecdotal for the moment, since I can’t verify
> for sure that switching the sql driver didn’t introduce some other
> race or unrelated problem.
>
> Let me just caution that we can’t recommend replacing our mysql
> backend without real performance and load testing.
I agree. Not saying that the tests are somehow complete, but here is
what I was into last two days.
There is a nice openstack project called Rally that is designed to
allow easy benchmarks for openstack projects. They have four scenarios
for neutron implemented: for networks, ports, routers, and subnets.
Each scenario combines create and list commands.
I've run each test with the following runner settings: times = 100,
concurrency = 10, meaning each scenario is run 100 times in parallel,
and there were not more than 10 parallel scenarios running. Then I've
repeated the same for times = 100, concurrency = 20 (also set
max_pool_size to 20 to allow sqlalchemy utilize that level of
parallelism), and times = 1000, concurrency = 100 (same note on
sqlalchemy parallelism).
You can find detailed html files with nice graphs here [1]. Brief
description of results is below:
1. create_and_list_networks scenario: for 10 parallel workers
performance boost is -12.5% from original time, for 20 workers -6.3%,
for 100 workers there is a slight reduction of average time spent for
scenario +9.4% (this is the only scenario that showed slight reduction
in performance, I'll try to rerun the test tomorrow to see whether it
was some discrepancy when I executed it that influenced the result).
2. create_and_list_ports scenario: for 10 parallel workers boost is
- -25.8%, for 20 workers it's -9.4%, and for 100 workers it's -12.6%.
3. create_and_list_routers scenario: for 10 parallel workers boost is
- -46.6% (almost half of original time), for 20 workers it's -51.7%
(more than a half), for 100 workers it's -41.5%.
4. create_and_list_subnets scenario: for 10 parallel workers boost is
- -26.4%, for 20 workers it's -51.1% (more than half reduction in time
spent for average scenario), and for 100 workers it's -31.7%.
I've tried to check how it scales till 200 parallel workers, but was
hit by local file opened limits and mysql max_connection settings. I
will retry my tests with limits raised tomorrow to see how it handles
that huge load.
Tomorrow I will also try to test new library with multiple API workers.
Other than that, what are your suggestions on what to check/test?
FYI: [1] contains the following directories:
mysqlconnector/
mysqldb/
Each of them contains the following directories:
10-10/ - 10 parallel workers, max_pool_size = 10 (default)
20-100/ - 20 parallel workers, max_pool_size = 100
100-100/ - 100 parallel workers, max_pool_size = 100
Happy analysis!
[1]: http://people.redhat.com/~ihrachys/
/Ihar
>
> Vish
>
>>>>
>>>>> sqlalchemy is not the main bottleneck across projects.
>>>>>
>>>>> Vish
>>>>>
>>>>> P.S. The performanace in all cases was abysmal, so
>>>>> performance work
>>> definitely
>>>>> needs to be done, but just the guess that replacing our
>>>>> mysql
>>> library is going to
>>>>> solve all of our performance problems appears to be
>>>>> incorrect at
>>> first blush.
>>>>
>>>> The motivation is still mostly deadlock relief but more
>>>> performance
>>> work should be done. I agree with you there. I'm still
>>> hopeful for some improvement from this.
>>>
>>>
>>> To identify performance that's alleviated by async you have to
>>> establish up front that IO blocking is the issue, which would
>>> entail having code that's blazing fast until you start running
>>> it against concurrent connections, at which point you can
>>> identify via profiling that IO operations are being serialized.
>>> This is a very specific issue.
>>>
>>> In contrast, to identify why some arbitrary openstack app is
>>> slow, my bet is that async is often not the big issue. Every
>>> day I look at openstack code and talk to people working on
>>> things, I see many performance issues that have nothing to do
>>> with concurrency, and as I detailed in my wiki page at
>>> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy there
>>> is a long road to cleaning up all the excessive queries,
>>> hundreds of unnecessary rows and columns being pulled over the
>>> network, unindexed lookups, subquery joins, hammering of
>>> Python-intensive operations (often due to the nature of OS apps
>>> as lots and lots of tiny API calls) that can be cached.
>>> There's a clear path to tons better performance documented
>>> there and most of it is not about async - which means that
>>> successful async isn't going to solve all those issues.
>>>
>>
>> Of course there is a long road to decent performance, and
>> switching a library won't magically fix all out issues. But if it
>> will fix deadlocks, and give 30% to 150% performance boost for
>> different operations, and since the switch is almost smooth, this
>> is something worth doing.
>>
>>>
>>>
>>>
>>> _______________________________________________ OpenStack-dev
>>> mailing list OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>>
>>
>>>
_______________________________________________
>> OpenStack-dev mailing list OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>>
>
>
> _______________________________________________ OpenStack-dev
> mailing list OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBCgAGBQJTxauNAAoJEC5aWaUY1u57l24IAJ+1c6OGz6ArEiR32gD0PnPV
Xk1d3c41UJcd+hzJ4sN7cJufdupUNHgbdS6EYZx/5u5gqyN7aWXbBO7hdPbGz/3A
0P39tGE7hcChkzAyE7EuzSGGBCwLeX1dO2guhEE65Cw3fGxODb637SuMOZV3LJGD
b2Z9xq7mrAzOVCV690INeBKA0oT19K0RUGcjJVbND8f3mv/SZ46xJ6EU5F2rFL6h
DrWOE5NkGCm8EsE8YABPls9KrJ9J/97an4jpFGWefBtOFKjnFjTdDDC9OFMdcM27
xvogphKxOk2u8OyKcG56XfoATCkj8ygRQtfqjmFb6dsvp7+jF+8dKyU1yw9eD2I=
=Ef2T
-----END PGP SIGNATURE-----
More information about the OpenStack-dev
mailing list