[Openstack-operators] Openstack and mysql galera with haproxy

Emrah Aslan e.aslan at logicom.com.tr
Tue Sep 23 08:49:06 UTC 2014


Hi all, 

This is a common problem SQL injections . Don’t forget to clear the cache after select *.* injections , eventhough you clear cache , not quite sure working properly after the process. As well as  Esp. Select *.* injections waits too long. There is still a black gap for those who aint got the high end server such  Cisco C240 M4  series. I am confident with C240M4 new generation. And we are testing the Cisco Mini series B200- VP.  It looks great for App/Web developers. 

I am interested in creating a group for those who interested in "vGPU"  (Application virtualization) such Autocad -Autodesk and Rendering programs. Please contact me if interested in vGPU. 

Kind Regards

Emrah ASLAN
Cisco/Citrix System Engineer

-----Original Message-----
From: Sławek Kapłoński [mailto:slawek at kaplonski.pl] 
Sent: Monday, September 22, 2014 11:02 PM
To: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] Openstack and mysql galera with haproxy

Hello,

Answears below

---
Best regards
Sławek Kapłoński
slawek at kaplonski.pl

Dnia poniedziałek, 22 września 2014 13:41:51 Jay Pipes pisze:
> Hi Peter, Sławek, answers inline...
> 
> On 09/22/2014 08:12 AM, Peter Boros wrote:
> > Hi,
> > 
> > StaleDataError is not given by MySQL, but rather SQLAlchemy. After a 
> > quick look, it seems like SQLAlchemy gets this, if the update 
> > updated different number of rows then it expected. I am not sure 
> > what is the expectation based on, perhaps soembody can chime in and 
> > we can put this together. What is the transaction isolation level 
> > you are running on?
> 
> The transaction isolation level is REPEATABLE_READ, unless Sławek has 
> changed the defaults (unlikely).
For sure I didn't change it
> 
> > For the timeout setting in neutron: that's a good way to approach it 
> > too, you can even be more agressive and set it to a few seconds. In 
> > MySQL making connections is very cheap (at least compared to other 
> > databases), an idle timeout of a few seconds for a connection is 
> > typical.
> > 
> > On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński 
> > <slawek at kaplonski.pl>
wrote:
> >> Hello,
> >> 
> >> Thanks for Your explanations. I thought so and now I decrease 
> >> "idle_connection_timeout" in neutron and nova. Now when master 
> >> server is back to cluster than in less than one minute all 
> >> conections are again made to this master node becuase old 
> >> connections which was made to backup node are closed. So for now it 
> >> looks almost perfect but when I now testing cluster (with master 
> >> node active and all connections established to this node) in neutron I still sometimes see errors like:
> >> StaleDataError: UPDATE statement on table 'ports' expected to 
> >> update 1 row(s); 0 were matched.
> >> 
> >> and also today I found errors like:
> >> 2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine 
> >> [-] ROLLBACK 2014-09-22 11:38:05.784 11474 ERROR 
> >> neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped.
> >> 2014-09-22 11:38:05.784 11474 TRACE 
> >> neutron.openstack.common.db.sqlalchemy.session Traceback (most 
> >> recent call
> >> last):
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-
> >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 
> >> 524, in _wrap
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     return f(*args,
> >> **kwargs) 2014-09-22 11:38:05.784 11474 TRACE
> 
>  From looking up the code, it looks like you are using Havana [1]. The 
> code in the master branch of Neutron now uses oslo.db, not 
> neutron.openstack.common.db, so this issue may have been resolved in 
> later versions of Neutron.
Yes, I'm using havana and I have now no possibility to upgrade it fast to icehouse (about master branch I even don't want to think :)). Do You want to tell me that this problem will be existing in havana and this can't be fixed in that release?
> 
> [1]
> https://github.com/openstack/neutron/blob/stable/havana/neutron/openst
> ack/co
> mmon/db/sqlalchemy/session.py#L524
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-
> >> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 
> >> 718, in flush 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     return super(Session,
> >> self).flush(*args, **kwargs)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
> >> 1818, in flush
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     self._flush(objects)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
> >> 1936, in _flush
> >> 2014-09-22 11:38:05.784 11474 TRACE 
> >> neutron.openstack.common.db.sqlalchemy.session
> >> transaction.rollback(_capture_exception=True)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", 
> >> line 58, in __exit__
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session    
> >> compat.reraise(exc_type,
> >> exc_value, exc_tb)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 
> >> 1900, in _flush
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session    
> >> flush_context.execute()
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", 
> >> line 372, in execute
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     rec.execute(self)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", 
> >> line 525, in execute
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     uow
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", 
> >> line 64, in save_obj
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     table, insert)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", 
> >> line 541, in _emit_insert_statements
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     execute(statement,
> >> multiparams)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 
> >> 662, in execute
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     params)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 
> >> 761, in _execute_clauseelement
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     compiled_sql,
> >> distilled_params
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 
> >> 874, in _execute_context
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     context)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 
> >> 1027, in _handle_dbapi_exception
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session    
> >> util.reraise(*exc_info)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 
> >> 856, in _execute_context
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     context)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py"
> >> , line 60, in do_executemany
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     rowcount =
> >> cursor.executemany(statement, parameters)
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session   File
> >> "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in 
> >> executemany
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session     r = r +
> >> self.execute(query, a)
> >> 2014-09-22 11:38:05.784 11474 TRACE 
> >> neutron.openstack.common.db.sqlalchemy.session TypeError: 
> >> unsupported operand type(s) for +: 'int' and 'NoneType'
> 
> Hmm, this is just bad coding in the MySQLdb driver, frankly. It is 
> assuming a call to Cursor.execute() will return an integer, but it can 
> return None in some circumstances. See code here:
> 
> http://sourceforge.net/p/mysql-python/code/ci/8041cc6df636b9c42d52e01b
> 727aa9 8b43f3632c/tree/MySQLdb/MySQLdb/cursors.py
> 
> Note that MySQLdb1 (the above, which is what is packaged in your Linux 
> distro I believe) is old and buggy. The maintainer has moved on to 
> MySQLdb2, which has a different call interface in this part of the code.

Ok, I checked that code and I found that in fact sometimes method "execute" 
returns None and when it is called from executemany than there is problem. 
Simple change in executemany method to:

    if not m:
	r = 0
            for a in args:
                iter_r = self.execute(query, a)
                if iter_r:
                    r = r + iter_r
            return r

for me now in that tests which I made but I don't know what be result of that change in longer time period and why execute method returns that None value :/

You wrote that MySQLdb2 has different call interface in this part of code - should change to MySQLdb2 fix that problem or not? If yes can You explain me how I can change it? Should I install other package just or maybe I should set in somewhere in config? (I'm using ubuntu 12.04)

> 
> >> 2014-09-22 11:38:05.784 11474 TRACE
> >> neutron.openstack.common.db.sqlalchemy.session
> >> 
> >> And I have to investigate why it is happening becuase whith single mysql
> >> server I have no such errors :/
> 
> Not sure, frankly. The code is executing many INSERT or UPDATE
> statements in a single block. The MySQL connection is clearly getting
> borked on one of those attempts and results in the traceback you see above.

I'm not sure in 100% but on much bigger cluster with one database server there 
was no such problem even once so I supposed that it is related somehow to 
galera which I now want to use.

> 
> best,
> -jay
> 
> >> ---
> >> Best regards
> >> Sławek Kapłoński
> >> slawek at kaplonski.pl
> >> 
> >> Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze:
> >>> Hi,
> >>> 
> >>> Let me answer this and one of your previous questions in one because
> >>> they are related.
> >>> 
> >>> Earlier you wrote:
> >>>> I made such modifications today in my infra and generally it looks
> >>>> better now. I don't see deadlocks. But I have one more problem with
> >>>> that: generally it works fine when main node is active but in
> >>>> situation when this node is down, haproxy connect to one of backup
> >>>> nodes. Still all is ok but problem is when main node is up again - all
> >>>> new connections are made to main node but active connections which was
> >>>> made to backup node still are active and neutron (or nova) are using
> >>>> connections to two servers and then there are problems with deadlock
> >>>> again.
> >>>> Do You know how to prevent such situation?
> >>> 
> >>> This is because of how haproxy works. Haproxy's load balancing is TCP
> >>> level, once the TCP connection is established, haproxy has nothing to
> >>> do with it. If the MySQL application (neutron in this case), uses
> >>> persistent connections, at the time of failing over, haproxy doesn't
> >>> make an extra decision upon failover, because a connection is already
> >>> established. This can be mitigated by using haproxy 1.5 and defining
> >>> the backend with on-marked-down shutdown-sessions, this will kill the
> >>> connections at the TCP level on the formerly active node. Or in case
> >>> of graceful failover, include killing connections in the failover
> >>> script on the formerly active node. The application in this case will
> >>> get error 1 or 2 you described.
> >>> 
> >>>  From your description error 1 and 2 are related to killing
> >>> 
> >>> connections. Case 1 (MySQL server has gone away) happens when the
> >>> connection was killed (but not at the MySQL protocol level) while it
> >>> was idle, and the application is attempting to re-use it. In this case
> >>> the correct behaviour would be re-establishing the connection. Error 2
> >>> is the same thing, but while the connection was actually doing
> >>> something, reconnecting and retrying is the correct behaviour. These
> >>> errors are not avoidable, if the node dies non-gracefully. A server
> >>> can for example lose power while doing the transaction, in this case
> >>> the transaction will be aborted, and the application will get one of
> >>> the errors described above. The application has to know that the data
> >>> is not written, since it didn't do commit or database didn't
> >>> acknowledge the commit.
> >>> 
> >>> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <slawek at kaplonski.pl>
> >> 
> >> wrote:
> >>>> Hello,
> >>>> 
> >>>> New questions below
> >>>> 
> >>>> ---
> >>>> Best regards
> >>>> Sławek Kapłoński
> >>>> slawek at kaplonski.pl
> >>>> 
> >>>> Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
> >>>>> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
> >>>>>> Hello,
> >>>>>> 
> >>>>>> Is anyone here using openstack with mysql galera and haproxy? Have
> >>>>>> You
> >>>>>> got
> >>>>>> any problems with that?
> >>>>>> I was today installed such ha infra for database (two mysql servers
> >>>>>> in
> >>>>>> galera cluster and haproxy on controller and neutron node, this
> >>>>>> haproxy
> >>>>>> is connecting to one of galera servers with round robin algorithm).
> >>>>>> Generally all is working fine but I have few problems:
> >>>>>> 1. I have a lot of messages like:
> >>>>>> WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
> >>>>>> server
> >>>>>> has gone away: (2006, 'MySQL server has gone away')
> >>>>>> 2. I have (most on neutron) many errors like:
> >>>>>> OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
> >>>>>> server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
> >>>>>> driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
> >>>>> 
> >>>>> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need
> >>>>> to be just a little longer than MySQL's connection timeouts.
> >>>> 
> >>>> After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in
> >>>> neutron
> >>>> and nova problem 1 looks that is solver. Unfortunatelly I found that
> >>>> when
> >>>> I'm deleting port from neutron I still sometimes have got errors like
> >>>> in
> >>>> 2. I don't check exactly nova logs yet so I'm not sure is it only in
> >>>> neutron or in both.
> >>>> Do You maybe know why it happens in neutron? It not happend when I have
> >>>> single mysql node without haproxy and galera so I suppose that haproxy
> >>>> or
> >>>> galera is responsible for that problem :/
> >>>> 
> >>>>>> 3. Also errors:
> >>>>>> StaleDataError: UPDATE statement on table 'ports' expected to update
> >>>>>> 1
> >>>>>> row(s); 0 were matched.
> >>>>>> 4. and errors:
> >>>>>> DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to
> >>>>>> get
> >>>>>> lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
> >>>>>> first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
> >>>>> 
> >>>>> 3 and 4 are a known issue. Our code doesn't always retry transactions,
> >>>>> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
> >>>>> work.
> >>>>> 
> >>>>> You can use ACTIVE/PASSIVE, and even do vertical partitioning where
> >>>>> one of the servers is ACTIVE for Nova, but another one is ACTIVE for
> >>>>> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work
> >>>>> hasn't
> >>>>> been done to make the concurrent transactions work properly.
> >>>>> 
> >>>>> _______________________________________________
> >>>>> OpenStack-operators mailing list
> >>>>> OpenStack-operators at lists.openstack.org
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator
> >>>>> s
> >>>> 
> >>>> _______________________________________________
> >>>> OpenStack-operators mailing list
> >>>> OpenStack-operators at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >> 
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


More information about the OpenStack-operators mailing list