[Openstack-operators] Openstack and mysql galera with haproxy

Peter Boros peter.boros at percona.com
Mon Sep 22 12:12:21 UTC 2014


Hi,

StaleDataError is not given by MySQL, but rather SQLAlchemy. After a
quick look, it seems like SQLAlchemy gets this, if the update updated
different number of rows then it expected. I am not sure what is the
expectation based on, perhaps soembody can chime in and we can put
this together. What is the transaction isolation level you are running
on?

For the timeout setting in neutron: that's a good way to approach it
too, you can even be more agressive and set it to a few seconds. In
MySQL making connections is very cheap (at least compared to other
databases), an idle timeout of a few seconds for a connection is
typical.

On Mon, Sep 22, 2014 at 12:35 PM, Sławek Kapłoński <slawek at kaplonski.pl> wrote:
> Hello,
>
> Thanks for Your explanations. I thought so and now I decrease
> "idle_connection_timeout" in neutron and nova. Now when master server is back
> to cluster than in less than one minute all conections are again made to this
> master node becuase old connections which was made to backup node are closed.
> So for now it looks almost perfect but when I now testing cluster (with master
> node active and all connections established to this node) in neutron I still
> sometimes see errors like:
> StaleDataError: UPDATE statement on table 'ports' expected to update 1 row(s);
> 0 were matched.
>
> and also today I found errors like:
> 2014-09-22 11:38:05.715 11474 INFO sqlalchemy.engine.base.Engine [-] ROLLBACK
> 2014-09-22 11:38:05.784 11474 ERROR
> neutron.openstack.common.db.sqlalchemy.session [-] DB exception wrapped.
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session Traceback (most recent call
> last):
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-
> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 524, in
> _wrap
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     return f(*args, **kwargs)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-
> packages/neutron/openstack/common/db/sqlalchemy/session.py", line 718, in flush
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     return super(Session,
> self).flush(*args, **kwargs)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1818, in
> flush
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     self._flush(objects)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1936, in
> _flush
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session
> transaction.rollback(_capture_exception=True)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/util/langhelpers.py", line 58, in
> __exit__
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     compat.reraise(exc_type,
> exc_value, exc_tb)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1900, in
> _flush
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     flush_context.execute()
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 372, in
> execute
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     rec.execute(self)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 525, in
> execute
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     uow
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 64, in
> save_obj
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     table, insert)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 541, in
> _emit_insert_statements
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     execute(statement,
> multiparams)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in
> execute
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     params)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in
> _execute_clauseelement
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     compiled_sql,
> distilled_params
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in
> _execute_context
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     context)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1027, in
> _handle_dbapi_exception
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     util.reraise(*exc_info)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 856, in
> _execute_context
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     context)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/sqlalchemy/connectors/mysqldb.py", line 60,
> in do_executemany
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     rowcount =
> cursor.executemany(statement, parameters)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session   File
> "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 206, in
> executemany
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session     r = r + self.execute(query,
> a)
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session TypeError: unsupported operand
> type(s) for +: 'int' and 'NoneType'
> 2014-09-22 11:38:05.784 11474 TRACE
> neutron.openstack.common.db.sqlalchemy.session
>
> And I have to investigate why it is happening becuase whith single mysql
> server I have no such errors :/
>
> ---
> Best regards
> Sławek Kapłoński
> slawek at kaplonski.pl
>
> Dnia poniedziałek, 22 września 2014 11:18:27 Peter Boros pisze:
>> Hi,
>>
>> Let me answer this and one of your previous questions in one because
>> they are related.
>>
>> Earlier you wrote:
>> > I made such modifications today in my infra and generally it looks
>> > better now. I don't see deadlocks. But I have one more problem with
>> > that: generally it works fine when main node is active but in
>> > situation when this node is down, haproxy connect to one of backup
>> > nodes. Still all is ok but problem is when main node is up again - all
>> > new connections are made to main node but active connections which was
>> > made to backup node still are active and neutron (or nova) are using
>> > connections to two servers and then there are problems with deadlock
>> > again.
>> > Do You know how to prevent such situation?
>>
>> This is because of how haproxy works. Haproxy's load balancing is TCP
>> level, once the TCP connection is established, haproxy has nothing to
>> do with it. If the MySQL application (neutron in this case), uses
>> persistent connections, at the time of failing over, haproxy doesn't
>> make an extra decision upon failover, because a connection is already
>> established. This can be mitigated by using haproxy 1.5 and defining
>> the backend with on-marked-down shutdown-sessions, this will kill the
>> connections at the TCP level on the formerly active node. Or in case
>> of graceful failover, include killing connections in the failover
>> script on the formerly active node. The application in this case will
>> get error 1 or 2 you described.
>>
>> From your description error 1 and 2 are related to killing
>> connections. Case 1 (MySQL server has gone away) happens when the
>> connection was killed (but not at the MySQL protocol level) while it
>> was idle, and the application is attempting to re-use it. In this case
>> the correct behaviour would be re-establishing the connection. Error 2
>> is the same thing, but while the connection was actually doing
>> something, reconnecting and retrying is the correct behaviour. These
>> errors are not avoidable, if the node dies non-gracefully. A server
>> can for example lose power while doing the transaction, in this case
>> the transaction will be aborted, and the application will get one of
>> the errors described above. The application has to know that the data
>> is not written, since it didn't do commit or database didn't
>> acknowledge the commit.
>>
>> On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <slawek at kaplonski.pl>
> wrote:
>> > Hello,
>> >
>> > New questions below
>> >
>> > ---
>> > Best regards
>> > Sławek Kapłoński
>> > slawek at kaplonski.pl
>> >
>> > Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
>> >> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
>> >> > Hello,
>> >> >
>> >> > Is anyone here using openstack with mysql galera and haproxy? Have You
>> >> > got
>> >> > any problems with that?
>> >> > I was today installed such ha infra for database (two mysql servers in
>> >> > galera cluster and haproxy on controller and neutron node, this haproxy
>> >> > is connecting to one of galera servers with round robin algorithm).
>> >> > Generally all is working fine but I have few problems:
>> >> > 1. I have a lot of messages like:
>> >> > WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
>> >> > server
>> >> > has gone away: (2006, 'MySQL server has gone away')
>> >> > 2. I have (most on neutron) many errors like:
>> >> > OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
>> >> > server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
>> >> > driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
>> >>
>> >> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need
>> >> to be just a little longer than MySQL's connection timeouts.
>> >
>> > After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in neutron
>> > and nova problem 1 looks that is solver. Unfortunatelly I found that when
>> > I'm deleting port from neutron I still sometimes have got errors like in
>> > 2. I don't check exactly nova logs yet so I'm not sure is it only in
>> > neutron or in both.
>> > Do You maybe know why it happens in neutron? It not happend when I have
>> > single mysql node without haproxy and galera so I suppose that haproxy or
>> > galera is responsible for that problem :/
>> >
>> >> > 3. Also errors:
>> >> > StaleDataError: UPDATE statement on table 'ports' expected to update 1
>> >> > row(s); 0 were matched.
>> >> > 4. and errors:
>> >> > DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to
>> >> > get
>> >> > lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
>> >> > first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
>> >>
>> >> 3 and 4 are a known issue. Our code doesn't always retry transactions,
>> >> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
>> >> work.
>> >>
>> >> You can use ACTIVE/PASSIVE, and even do vertical partitioning where
>> >> one of the servers is ACTIVE for Nova, but another one is ACTIVE for
>> >> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work hasn't
>> >> been done to make the concurrent transactions work properly.
>> >>
>> >> _______________________________________________
>> >> OpenStack-operators mailing list
>> >> OpenStack-operators at lists.openstack.org
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> >
>> > _______________________________________________
>> > OpenStack-operators mailing list
>> > OpenStack-operators at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



-- 
Peter Boros, Principal Architect, Percona
Telephone: +1 888 401 3401 ext 546
Emergency: +1 888 401 3401 ext 911
Skype: percona.pboros



More information about the OpenStack-operators mailing list