[Openstack-operators] Openstack and mysql galera with haproxy

Jay Pipes jaypipes at gmail.com
Mon Sep 22 17:41:51 UTC 2014


Hi Peter, Sławek, answers inline...

On 09/22/2014 08:12 AM, Peter Boros wrote:
> Hi,
>
> StaleDataError is not given by MySQL, but rather SQLAlchemy. After a
> quick look, it seems like SQLAlchemy gets this, if the update updated
> different number of rows then it expected. I am not sure what is the
> expectation based on, perhaps soembody can chime in and we can put
> this together. What is the transaction isolation level you are running
> on?

The transaction isolation level is REPEATABLE_READ, unless Sławek has 
changed the defaults (unlikely).

> For the timeout setting in neutron: that's a good way to approach it
> too, you can even be more agressive and set it to a few seconds. In
> MySQL making connections is very cheap (at least compared to other
> databases), an idle timeout of a few seconds for a connection is
> typical.
>
> On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński <slawek at kaplonski.pl> wrote:
>> Hello,
>>
>> Thanks for Your explanations. I thought so and now I decrease
>> "idle_connection_timeout" in neutron and nova. Now when master server is back
>> to cluster than in less than one minute all conections are again made to this
>> master node becuase old connections which was made to backup node are closed.
>> So for now it looks almost perfect but when I now testing cluster (with master
>> node active and all connections established to this node) in neutron I still
>> sometimes see errors like:
>> StaleDataError: UPDATE statement on table 'ports' expected to update 1 row(s);
>> 0 were matched.
>>
>> and also today I found errors like:
>> 2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-] ROLLBACK
>> 2014-09-22 11:38:05.784 11474 ERROR
>> neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped.
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session Traceback (most recent call
>> last):
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-
>> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, in
>> _wrap
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     return f(*args, **kwargs)
>> 2014-09-22 11:38:05.784 11474 TRACE

 From looking up the code, it looks like you are using Havana [1]. The 
code in the master branch of Neutron now uses oslo.db, not 
neutron.openstack.common.db, so this issue may have been resolved in 
later versions of Neutron.

[1] 
https://github.com/openstack/neutron/blob/stable/havana/neutron/openstack/common/db/sqlalchemy/session.py#L524

>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-
>> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, in flush
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     return super(Session,
>> self).flush(*args, **kwargs)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1818, in
>> flush
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     self._flush(objects)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1936, in
>> _flush
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session
>> transaction.rollback(_capture_exception=True)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 58, in
>> __exit__
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     compat.reraise(exc_type,
>> exc_value, exc_tb)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1900, in
>> _flush
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     flush_context.execute()
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 372, in
>> execute
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     rec.execute(self)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 525, in
>> execute
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     uow
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 64, in
>> save_obj
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     table, insert)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 541, in
>> _emit_insert_statements
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     execute(statement,
>> multiparams)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in
>> execute
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     params)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in
>> _execute_clauseelement
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     compiled_sql,
>> distilled_params
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in
>> _execute_context
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     context)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1027, in
>> _handle_dbapi_exception
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     util.reraise(*exc_info)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, in
>> _execute_context
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     context)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", line 60,
>> in do_executemany
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     rowcount =
>> cursor.executemany(statement, parameters)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session   File
>> "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in
>> executemany
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session     r = r + self.execute(query,
>> a)
>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported operand
>> type(s) for +: 'int' and 'NoneType'

Hmm, this is just bad coding in the MySQLdb driver, frankly. It is 
assuming a call to Cursor.execute() will return an integer, but it can 
return None in some circumstances. See code here:

http://sourceforge.net/p/mysql-python/code/ci/8041cc6df636b9c42d52e01b727aa98b43f3632c/tree/MySQLdb/MySQLdb/cursors.py

Note that MySQLdb1 (the above, which is what is packaged in your Linux 
distro I believe) is old and buggy. The maintainer has moved on to 
MySQLdb2, which has a different call interface in this part of the code.

>> 2014-09-22 11:38:05.784 11474 TRACE
>> neutron.openstack.common.db.sqlalchemy.session
>>
>> And I have to investigate why it is happening becuase whith single mysql
>> server I have no such errors :/

Not sure, frankly. The code is executing many INSERT or UPDATE 
statements in a single block. The MySQL connection is clearly getting 
borked on one of those attempts and results in the traceback you see above.

best,
-jay

>> ---
>> Best regards
>> Sławek Kapłoński
>> slawek at kaplonski.pl
>>
>> Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze:
>>> Hi,
>>>
>>> Let me answer this and one of your previous questions in one because
>>> they are related.
>>>
>>> Earlier you wrote:
>>>> I made such modifications today in my infra and generally it looks
>>>> better now. I don't see deadlocks. But I have one more problem with
>>>> that: generally it works fine when main node is active but in
>>>> situation when this node is down, haproxy connect to one of backup
>>>> nodes. Still all is ok but problem is when main node is up again - all
>>>> new connections are made to main node but active connections which was
>>>> made to backup node still are active and neutron (or nova) are using
>>>> connections to two servers and then there are problems with deadlock
>>>> again.
>>>> Do You know how to prevent such situation?
>>>
>>> This is because of how haproxy works. Haproxy's load balancing is TCP
>>> level, once the TCP connection is established, haproxy has nothing to
>>> do with it. If the MySQL application (neutron in this case), uses
>>> persistent connections, at the time of failing over, haproxy doesn't
>>> make an extra decision upon failover, because a connection is already
>>> established. This can be mitigated by using haproxy 1.5 and defining
>>> the backend with on-marked-down shutdown-sessions, this will kill the
>>> connections at the TCP level on the formerly active node. Or in case
>>> of graceful failover, include killing connections in the failover
>>> script on the formerly active node. The application in this case will
>>> get error 1 or 2 you described.
>>>
>>>  From your description error 1 and 2 are related to killing
>>> connections. Case 1 (MySQL server has gone away) happens when the
>>> connection was killed (but not at the MySQL protocol level) while it
>>> was idle, and the application is attempting to re-use it. In this case
>>> the correct behaviour would be re-establishing the connection. Error 2
>>> is the same thing, but while the connection was actually doing
>>> something, reconnecting and retrying is the correct behaviour. These
>>> errors are not avoidable, if the node dies non-gracefully. A server
>>> can for example lose power while doing the transaction, in this case
>>> the transaction will be aborted, and the application will get one of
>>> the errors described above. The application has to know that the data
>>> is not written, since it didn't do commit or database didn't
>>> acknowledge the commit.
>>>
>>> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <slawek at kaplonski.pl>
>> wrote:
>>>> Hello,
>>>>
>>>> New questions below
>>>>
>>>> ---
>>>> Best regards
>>>> Sławek Kapłoński
>>>> slawek at kaplonski.pl
>>>>
>>>> Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
>>>>> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
>>>>>> Hello,
>>>>>>
>>>>>> Is anyone here using openstack with mysql galera and haproxy? Have You
>>>>>> got
>>>>>> any problems with that?
>>>>>> I was today installed such ha infra for database (two mysql servers in
>>>>>> galera cluster and haproxy on controller and neutron node, this haproxy
>>>>>> is connecting to one of galera servers with round robin algorithm).
>>>>>> Generally all is working fine but I have few problems:
>>>>>> 1. I have a lot of messages like:
>>>>>> WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
>>>>>> server
>>>>>> has gone away: (2006, 'MySQL server has gone away')
>>>>>> 2. I have (most on neutron) many errors like:
>>>>>> OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
>>>>>> server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
>>>>>> driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
>>>>>
>>>>> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need
>>>>> to be just a little longer than MySQL's connection timeouts.
>>>>
>>>> After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in neutron
>>>> and nova problem 1 looks that is solver. Unfortunatelly I found that when
>>>> I'm deleting port from neutron I still sometimes have got errors like in
>>>> 2. I don't check exactly nova logs yet so I'm not sure is it only in
>>>> neutron or in both.
>>>> Do You maybe know why it happens in neutron? It not happend when I have
>>>> single mysql node without haproxy and galera so I suppose that haproxy or
>>>> galera is responsible for that problem :/
>>>>
>>>>>> 3. Also errors:
>>>>>> StaleDataError: UPDATE statement on table 'ports' expected to update 1
>>>>>> row(s); 0 were matched.
>>>>>> 4. and errors:
>>>>>> DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to
>>>>>> get
>>>>>> lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
>>>>>> first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
>>>>>
>>>>> 3 and 4 are a known issue. Our code doesn't always retry transactions,
>>>>> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
>>>>> work.
>>>>>
>>>>> You can use ACTIVE/PASSIVE, and even do vertical partitioning where
>>>>> one of the servers is ACTIVE for Nova, but another one is ACTIVE for
>>>>> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work hasn't
>>>>> been done to make the concurrent transactions work properly.
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-operators mailing list
>>>>> OpenStack-operators at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>>
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
>
>



More information about the OpenStack-operators mailing list