[Openstack-operators] Openstack and mysql galera with haproxy

Sławek Kapłoński slawek at kaplonski.pl
Mon Sep 22 20:02:26 UTC 2014


Hello,

Answears below

---
Best regards
Sławek Kapłoński
slawek at kaplonski.pl

Dnia poniedziałek, 22 września 2014 13:41:51 Jay Pipes pisze:
> Hi Peter, Sławek, answers inline...
> 
> On 09/22/2014 08:12 AM, Peter Boros wrote:
> > Hi,
> > 
> > StaleDataError is not given by MySQL, but rather SQLAlchemy. After a
> > quick look, it seems like SQLAlchemy gets this, if the update updated
> > different number of rows then it expected. I am not sure what is the
> > expectation based on, perhaps soembody can chime in and we can put
> > this together. What is the transaction isolation level you are running
> > on?
> 
> The transaction isolation level is REPEATABLE_READ, unless Sławek has
> changed the defaults (unlikely).
For sure I didn't change it
> 
> > For the timeout setting in neutron: that's a good way to approach it
> > too, you can even be more agressive and set it to a few seconds. In
> > MySQL making connections is very cheap (at least compared to other
> > databases), an idle timeout of a few seconds for a connection is
> > typical.
> > 
> > On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński <slawek at kaplonski.pl> 
wrote:
> >> Hello,
> >> 
> >> Thanks for Your explanations. I thought so and now I decrease
> >> "idle_connection_timeout" in neutron and nova. Now when master server is
> >> back to cluster than in less than one minute all conections are again
> >> made to this master node becuase old connections which was made to
> >> backup node are closed. So for now it looks almost perfect but when I
> >> now testing cluster (with master node active and all connections
> >> established to this node) in neutron I still sometimes see errors like:
> >> StaleDataError: UPDATE statement on table 'ports' expected to update 1
> >> row(s); 0 were matched.
> >> 
> >> and also today I found errors like:
> >> 2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-]
> >> ROLLBACK 2014-09-22 11:38:05.784 11474 ERROR
> >> neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped.
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session Traceback (most recent
> >> call
> >> last):
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-
> >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, in
> >> _wrap
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     return f(*args,
> >> **kwargs) 2014-09-22 11:38:05.784 11474 TRACE
> 
>  From looking up the code, it looks like you are using Havana [1]. The
> code in the master branch of Neutron now uses oslo.db, not
> neutron.openstack.common.db, so this issue may have been resolved in
> later versions of Neutron.
Yes, I'm using havana and I have now no possibility to upgrade it fast to 
icehouse (about master branch I even don't want to think :)). Do You want to 
tell me that this problem will be existing in havana and this can't be fixed in 
that release?
> 
> [1]
> https://github.com/openstack/neutron/blob/stable/havana/neutron/openstack/co
> mmon/db/sqlalchemy/session.py#L524
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-
> >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, in
> >> flush 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     return super(Session,
> >> self).flush(*args, **kwargs)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1818,
> >> in
> >> flush
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     self._flush(objects)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1936,
> >> in
> >> _flush
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session
> >> transaction.rollback(_capture_exception=True)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line
> >> 58, in __exit__
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session    
> >> compat.reraise(exc_type,
> >> exc_value, exc_tb)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1900,
> >> in
> >> _flush
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session    
> >> flush_context.execute()
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line
> >> 372, in execute
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     rec.execute(self)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line
> >> 525, in execute
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     uow
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line
> >> 64, in save_obj
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     table, insert)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line
> >> 541, in _emit_insert_statements
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     execute(statement,
> >> multiparams)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662,
> >> in
> >> execute
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     params)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761,
> >> in
> >> _execute_clauseelement
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     compiled_sql,
> >> distilled_params
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874,
> >> in
> >> _execute_context
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     context)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1027,
> >> in
> >> _handle_dbapi_exception
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session    
> >> util.reraise(*exc_info)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856,
> >> in
> >> _execute_context
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     context)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", line
> >> 60, in do_executemany
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     rowcount =
> >> cursor.executemany(statement, parameters)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in
> >> executemany
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     r = r +
> >> self.execute(query, a)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported
> >> operand type(s) for +: 'int' and 'NoneType'
> 
> Hmm, this is just bad coding in the MySQLdb driver, frankly. It is
> assuming a call to Cursor.execute() will return an integer, but it can
> return None in some circumstances. See code here:
> 
> http://sourceforge.net/p/mysql-python/code/ci/8041cc6df636b9c42d52e01b727aa9
> 8b43f3632c/tree/MySQLdb/MySQLdb/cursors.py
> 
> Note that MySQLdb1 (the above, which is what is packaged in your Linux
> distro I believe) is old and buggy. The maintainer has moved on to
> MySQLdb2, which has a different call interface in this part of the code.

Ok, I checked that code and I found that in fact sometimes method "execute" 
returns None and when it is called from executemany than there is problem. 
Simple change in executemany method to:

    if not m:
	r = 0
            for a in args:
                iter_r = self.execute(query, a)
                if iter_r:
                    r = r + iter_r
            return r

for me now in that tests which I made but I don't know what be result of that 
change in longer time period and why execute method returns that None value :/

You wrote that MySQLdb2 has different call interface in this part of code - 
should change to MySQLdb2 fix that problem or not? If yes can You explain me 
how I can change it? Should I install other package just or maybe I should set 
in somewhere in config? (I'm using ubuntu 12.04)

> 
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session
> >> 
> >> And I have to investigate why it is happening becuase whith single mysql
> >> server I have no such errors :/
> 
> Not sure, frankly. The code is executing many INSERT or UPDATE
> statements in a single block. The MySQL connection is clearly getting
> borked on one of those attempts and results in the traceback you see above.

I'm not sure in 100% but on much bigger cluster with one database server there 
was no such problem even once so I supposed that it is related somehow to 
galera which I now want to use.

> 
> best,
> -jay
> 
> >> ---
> >> Best regards
> >> Sławek Kapłoński
> >> slawek at kaplonski.pl
> >> 
> >> Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze:
> >>> Hi,
> >>> 
> >>> Let me answer this and one of your previous questions in one because
> >>> they are related.
> >>> 
> >>> Earlier you wrote:
> >>>> I made such modifications today in my infra and generally it looks
> >>>> better now. I don't see deadlocks. But I have one more problem with
> >>>> that: generally it works fine when main node is active but in
> >>>> situation when this node is down, haproxy connect to one of backup
> >>>> nodes. Still all is ok but problem is when main node is up again - all
> >>>> new connections are made to main node but active connections which was
> >>>> made to backup node still are active and neutron (or nova) are using
> >>>> connections to two servers and then there are problems with deadlock
> >>>> again.
> >>>> Do You know how to prevent such situation?
> >>> 
> >>> This is because of how haproxy works. Haproxy's load balancing is TCP
> >>> level, once the TCP connection is established, haproxy has nothing to
> >>> do with it. If the MySQL application (neutron in this case), uses
> >>> persistent connections, at the time of failing over, haproxy doesn't
> >>> make an extra decision upon failover, because a connection is already
> >>> established. This can be mitigated by using haproxy 1.5 and defining
> >>> the backend with on-marked-down shutdown-sessions, this will kill the
> >>> connections at the TCP level on the formerly active node. Or in case
> >>> of graceful failover, include killing connections in the failover
> >>> script on the formerly active node. The application in this case will
> >>> get error 1 or 2 you described.
> >>> 
> >>>  From your description error 1 and 2 are related to killing
> >>> 
> >>> connections. Case 1 (MySQL server has gone away) happens when the
> >>> connection was killed (but not at the MySQL protocol level) while it
> >>> was idle, and the application is attempting to re-use it. In this case
> >>> the correct behaviour would be re-establishing the connection. Error 2
> >>> is the same thing, but while the connection was actually doing
> >>> something, reconnecting and retrying is the correct behaviour. These
> >>> errors are not avoidable, if the node dies non-gracefully. A server
> >>> can for example lose power while doing the transaction, in this case
> >>> the transaction will be aborted, and the application will get one of
> >>> the errors described above. The application has to know that the data
> >>> is not written, since it didn't do commit or database didn't
> >>> acknowledge the commit.
> >>> 
> >>> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <slawek at kaplonski.pl>
> >> 
> >> wrote:
> >>>> Hello,
> >>>> 
> >>>> New questions below
> >>>> 
> >>>> ---
> >>>> Best regards
> >>>> Sławek Kapłoński
> >>>> slawek at kaplonski.pl
> >>>> 
> >>>> Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
> >>>>> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
> >>>>>> Hello,
> >>>>>> 
> >>>>>> Is anyone here using openstack with mysql galera and haproxy? Have
> >>>>>> You
> >>>>>> got
> >>>>>> any problems with that?
> >>>>>> I was today installed such ha infra for database (two mysql servers
> >>>>>> in
> >>>>>> galera cluster and haproxy on controller and neutron node, this
> >>>>>> haproxy
> >>>>>> is connecting to one of galera servers with round robin algorithm).
> >>>>>> Generally all is working fine but I have few problems:
> >>>>>> 1. I have a lot of messages like:
> >>>>>> WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
> >>>>>> server
> >>>>>> has gone away: (2006, 'MySQL server has gone away')
> >>>>>> 2. I have (most on neutron) many errors like:
> >>>>>> OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
> >>>>>> server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
> >>>>>> driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
> >>>>> 
> >>>>> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need
> >>>>> to be just a little longer than MySQL's connection timeouts.
> >>>> 
> >>>> After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in
> >>>> neutron
> >>>> and nova problem 1 looks that is solver. Unfortunatelly I found that
> >>>> when
> >>>> I'm deleting port from neutron I still sometimes have got errors like
> >>>> in
> >>>> 2. I don't check exactly nova logs yet so I'm not sure is it only in
> >>>> neutron or in both.
> >>>> Do You maybe know why it happens in neutron? It not happend when I have
> >>>> single mysql node without haproxy and galera so I suppose that haproxy
> >>>> or
> >>>> galera is responsible for that problem :/
> >>>> 
> >>>>>> 3. Also errors:
> >>>>>> StaleDataError: UPDATE statement on table 'ports' expected to update
> >>>>>> 1
> >>>>>> row(s); 0 were matched.
> >>>>>> 4. and errors:
> >>>>>> DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to
> >>>>>> get
> >>>>>> lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
> >>>>>> first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
> >>>>> 
> >>>>> 3 and 4 are a known issue. Our code doesn't always retry transactions,
> >>>>> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
> >>>>> work.
> >>>>> 
> >>>>> You can use ACTIVE/PASSIVE, and even do vertical partitioning where
> >>>>> one of the servers is ACTIVE for Nova, but another one is ACTIVE for
> >>>>> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work
> >>>>> hasn't
> >>>>> been done to make the concurrent transactions work properly.
> >>>>> 
> >>>>> _______________________________________________
> >>>>> OpenStack-operators mailing list
> >>>>> OpenStack-operators at lists.openstack.org
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator
> >>>>> s
> >>>> 
> >>>> _______________________________________________
> >>>> OpenStack-operators mailing list
> >>>> OpenStack-operators at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >> 
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140922/f40827cb/attachment.pgp>


More information about the OpenStack-operators mailing list