[Openstack-operators] Openstack and mysql galera with haproxy

Sławek Kapłoński slawek at kaplonski.pl
Mon Sep 22 10:35:33 UTC 2014


Hello,

Thanks for Your explanations. I thought so and now I decrease 
"idle_connection_timeout" in neutron and nova. Now when master server is back 
to cluster than in less than one minute all conections are again made to this 
master node becuase old connections which was made to backup node are closed.
So for now it looks almost perfect but when I now testing cluster (with master 
node active and all connections established to this node) in neutron I still 
sometimes see errors like:
StaleDataError: UPDATE statement on table 'ports' expected to update 1 row(s); 
0 were matched.

and also today I found errors like:
2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-] ROLLBACK
2014-09-22 11:38:05.784 11474 ERROR 
neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped.
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session Traceback (most recent call 
last):
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-
packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, in 
_wrap
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     return f(*args, **kwargs)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-
packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, in flush
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     return super(Session, 
self).flush(*args, **kwargs)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1818, in 
flush
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     self._flush(objects)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1936, in 
_flush
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     
transaction.rollback(_capture_exception=True)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 58, in 
__exit__
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     compat.reraise(exc_type, 
exc_value, exc_tb)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1900, in 
_flush
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     flush_context.execute()
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 372, in 
execute
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     rec.execute(self)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 525, in 
execute
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     uow
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 64, in 
save_obj
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     table, insert)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 541, in 
_emit_insert_statements
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     execute(statement, 
multiparams)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in 
execute
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     params)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in 
_execute_clauseelement
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     compiled_sql, 
distilled_params
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in 
_execute_context
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     context)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1027, in 
_handle_dbapi_exception
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     util.reraise(*exc_info)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, in 
_execute_context
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     context)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", line 60, 
in do_executemany
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     rowcount = 
cursor.executemany(statement, parameters)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session   File 
"/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in 
executemany
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session     r = r + self.execute(query, 
a)
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported operand 
type(s) for +: 'int' and 'NoneType'
2014-09-22 11:38:05.784 11474 TRACE 
neutron.openstack.common.db.sqlalchemy.session 

And I have to investigate why it is happening becuase whith single mysql 
server I have no such errors :/

---
Best regards
Sławek Kapłoński
slawek at kaplonski.pl

Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze:
> Hi,
> 
> Let me answer this and one of your previous questions in one because
> they are related.
> 
> Earlier you wrote:
> > I made such modifications today in my infra and generally it looks
> > better now. I don't see deadlocks. But I have one more problem with
> > that: generally it works fine when main node is active but in
> > situation when this node is down, haproxy connect to one of backup
> > nodes. Still all is ok but problem is when main node is up again - all
> > new connections are made to main node but active connections which was
> > made to backup node still are active and neutron (or nova) are using
> > connections to two servers and then there are problems with deadlock
> > again.
> > Do You know how to prevent such situation?
> 
> This is because of how haproxy works. Haproxy's load balancing is TCP
> level, once the TCP connection is established, haproxy has nothing to
> do with it. If the MySQL application (neutron in this case), uses
> persistent connections, at the time of failing over, haproxy doesn't
> make an extra decision upon failover, because a connection is already
> established. This can be mitigated by using haproxy 1.5 and defining
> the backend with on-marked-down shutdown-sessions, this will kill the
> connections at the TCP level on the formerly active node. Or in case
> of graceful failover, include killing connections in the failover
> script on the formerly active node. The application in this case will
> get error 1 or 2 you described.
> 
> From your description error 1 and 2 are related to killing
> connections. Case 1 (MySQL server has gone away) happens when the
> connection was killed (but not at the MySQL protocol level) while it
> was idle, and the application is attempting to re-use it. In this case
> the correct behaviour would be re-establishing the connection. Error 2
> is the same thing, but while the connection was actually doing
> something, reconnecting and retrying is the correct behaviour. These
> errors are not avoidable, if the node dies non-gracefully. A server
> can for example lose power while doing the transaction, in this case
> the transaction will be aborted, and the application will get one of
> the errors described above. The application has to know that the data
> is not written, since it didn't do commit or database didn't
> acknowledge the commit.
> 
> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <slawek at kaplonski.pl> 
wrote:
> > Hello,
> > 
> > New questions below
> > 
> > ---
> > Best regards
> > Sławek Kapłoński
> > slawek at kaplonski.pl
> > 
> > Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
> >> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
> >> > Hello,
> >> > 
> >> > Is anyone here using openstack with mysql galera and haproxy? Have You
> >> > got
> >> > any problems with that?
> >> > I was today installed such ha infra for database (two mysql servers in
> >> > galera cluster and haproxy on controller and neutron node, this haproxy
> >> > is connecting to one of galera servers with round robin algorithm).
> >> > Generally all is working fine but I have few problems:
> >> > 1. I have a lot of messages like:
> >> > WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
> >> > server
> >> > has gone away: (2006, 'MySQL server has gone away')
> >> > 2. I have (most on neutron) many errors like:
> >> > OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
> >> > server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
> >> > driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
> >> 
> >> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need
> >> to be just a little longer than MySQL's connection timeouts.
> > 
> > After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in neutron
> > and nova problem 1 looks that is solver. Unfortunatelly I found that when
> > I'm deleting port from neutron I still sometimes have got errors like in
> > 2. I don't check exactly nova logs yet so I'm not sure is it only in
> > neutron or in both.
> > Do You maybe know why it happens in neutron? It not happend when I have
> > single mysql node without haproxy and galera so I suppose that haproxy or
> > galera is responsible for that problem :/
> > 
> >> > 3. Also errors:
> >> > StaleDataError: UPDATE statement on table 'ports' expected to update 1
> >> > row(s); 0 were matched.
> >> > 4. and errors:
> >> > DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to
> >> > get
> >> > lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
> >> > first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
> >> 
> >> 3 and 4 are a known issue. Our code doesn't always retry transactions,
> >> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
> >> work.
> >> 
> >> You can use ACTIVE/PASSIVE, and even do vertical partitioning where
> >> one of the servers is ACTIVE for Nova, but another one is ACTIVE for
> >> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work hasn't
> >> been done to make the concurrent transactions work properly.
> >> 
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > 
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140922/5b3b4199/attachment.pgp>


More information about the OpenStack-operators mailing list