[openstack-dev] [Neutron] Issue with pymysql

Armando M. armamig at gmail.com
Wed Jul 8 21:44:28 UTC 2015


On 8 July 2015 at 14:30, Salvatore Orlando <sorlando at nicira.com> wrote:

> I agree and I would make the switch as soon as possible. The graphite
> graph you posted showed that since 6/28 the difference in failure rate is
> such that isn't even statistically significant. However, spikes in failure
> rates of the unstable job also suggest that you're starting to chase a
> moving target, and we know how painful this is from the experience we had
> when enabling the neutron full job.
>

The spike was infrastructure failure-induced, but generally speaking I
agree with you.


>
> Salvatore
>
>
>
> On 8 July 2015 at 20:21, Armando M. <armamig at gmail.com> wrote:
>
>> Hi,
>>
>> Another brief update on the matter:
>>
>> Failure rate trends [1] are showing that unstable (w/ multiple API
>> workers + pymysql driver) and stable configurations (w/o) are virtually
>> aligned and I am proposing that it is time to drop the unstable infra
>> configuration [2,3] that allowed the team to triage/experiment and get to a
>> solution. I'll watch [1] a little longer before I think it's safe to
>> claim that we're out of the woods.
>>
>> Cheers,
>> Armando
>>
>> [1] http://goo.gl/YM7gUC
>> [2] https://review.openstack.org/#/c/199668/
>> [3] https://review.openstack.org/#/c/199672/
>>
>> On 22 June 2015 at 14:10, Armando M. <armamig at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> A brief update on the issue that sparked this thread:
>>>
>>> A little over a week ago, bug [1] was filed. The gist of that was that
>>> the switch to pymysql unveiled a number of latent race conditions that made
>>> Neutron unstable.
>>>
>>> To try and nip these in the bud, the Neutron team filed a number of
>>> patches [2], to create an unstable configuration that would allow them to
>>> troubleshoot and experiment a solution, by still keeping the stability in
>>> check (a preliminary proposal for a fix has been available in [4]).
>>>
>>> The latest failure rate trend is shown in [3]; as you can see, we're
>>> still gathering data, but it seems that the instability gap between the two
>>> jobs (stable vs unstable) has widened, and should give us plenty of data
>>> points to devise a resolution strategy.
>>>
>>> I have documented the most recurrent traces in the bug report [1].
>>>
>>> Will update once we managed to get the two curves to kiss each other
>>> again and close to a more acceptable failure rate.
>>>
>>> Cheers,
>>> Armando
>>>
>>> [1] https://bugs.launchpad.net/neutron/+bug/1464612
>>> [2] https://review.openstack.org/#/q/topic:neutron-unstable,n,z
>>> [3] http://goo.gl/YM7gUC
>>> [4] https://review.openstack.org/#/c/191540/
>>>
>>>
>>> On 12 June 2015 at 11:13, Boris Pavlovic <bpavlovic at mirantis.com> wrote:
>>>
>>>> Sean,
>>>>
>>>> Thanks for quick fix/revert https://review.openstack.org/#/c/191010/
>>>> This unblocked Rally gates...
>>>>
>>>> Best regards,
>>>> Boris Pavlovic
>>>>
>>>> On Fri, Jun 12, 2015 at 8:56 PM, Clint Byrum <clint at fewbar.com> wrote:
>>>>
>>>>> Excerpts from Mike Bayer's message of 2015-06-12 09:42:42 -0700:
>>>>> >
>>>>> > On 6/12/15 11:37 AM, Mike Bayer wrote:
>>>>> > >
>>>>> > >
>>>>> > > On 6/11/15 9:32 PM, Eugene Nikanorov wrote:
>>>>> > >> Hi neutrons,
>>>>> > >>
>>>>> > >> I'd like to draw your attention to an issue discovered by rally
>>>>> gate job:
>>>>> > >>
>>>>> http://logs.openstack.org/96/190796/4/check/gate-rally-dsvm-neutron-rally/7a18e43/logs/screen-q-svc.txt.gz?level=TRACE
>>>>> > >>
>>>>> > >> I don't have bandwidth to take a deep look at it, but first
>>>>> > >> impression is that it is some issue with nested transaction
>>>>> support
>>>>> > >> either on sqlalchemy or pymysql side.
>>>>> > >> Also, besides errors with nested transactions, there are a lot of
>>>>> > >> Lock wait timeouts.
>>>>> > >>
>>>>> > >> I think it makes sense to start with reverting the patch that
>>>>> moves
>>>>> > >> to pymysql.
>>>>> > > My immediate reaction is that this is perhaps a concurrency-related
>>>>> > > issue; because PyMySQL is pure python and allows for full blown
>>>>> > > eventlet monkeypatching, I wonder if somehow the same PyMySQL
>>>>> > > connection is being used in multiple contexts. E.g. one greenlet
>>>>> > > starts up a savepoint, using identifier "_3" which is based on a
>>>>> > > counter that is local to the SQLAlchemy Connection, but then
>>>>> another
>>>>> > > greenlet shares that PyMySQL connection somehow with another
>>>>> > > SQLAlchemy Connection that uses the same identifier.
>>>>> >
>>>>> > reading more of the log, it seems the main issue is just that
>>>>> there's a
>>>>> > deadlock on inserting into the securitygroups table.  The deadlock on
>>>>> > insert can be because of an index being locked.
>>>>> >
>>>>> >
>>>>> > I'd be curious to know how many greenlets are running concurrently
>>>>> here,
>>>>> > and what the overall transaction looks like within the operation
>>>>> that is
>>>>> > failing here (e.g. does each transaction insert multiple rows into
>>>>> > securitygroups?  that would make a deadlock seem more likely).
>>>>>
>>>>> This begs two questions:
>>>>>
>>>>> 1) Are we handling deadlocks with retries? It's important that we do
>>>>> that to be defensive.
>>>>>
>>>>> 2) Are we being careful to sort the table order in any multi-table
>>>>> transactions so that we minimize the chance of deadlocks happening
>>>>> because of any cross table deadlocks?
>>>>>
>>>>>
>>>>> __________________________________________________________________________
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>> Unsubscribe:
>>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150708/027819c9/attachment.html>


More information about the OpenStack-dev mailing list