[Openstack-operators] Openstack and mysql galera with haproxy

Peter Boros peter.boros at percona.com
Mon Sep 22 09:18:27 UTC 2014


Hi,

Let me answer this and one of your previous questions in one because
they are related.
Earlier you wrote:

> I made such modifications today in my infra and generally it looks
> better now. I don't see deadlocks. But I have one more problem with
> that: generally it works fine when main node is active but in
> situation when this node is down, haproxy connect to one of backup
> nodes. Still all is ok but problem is when main node is up again - all
> new connections are made to main node but active connections which was
> made to backup node still are active and neutron (or nova) are using
> connections to two servers and then there are problems with deadlock
> again.
> Do You know how to prevent such situation?

This is because of how haproxy works. Haproxy's load balancing is TCP
level, once the TCP connection is established, haproxy has nothing to
do with it. If the MySQL application (neutron in this case), uses
persistent connections, at the time of failing over, haproxy doesn't
make an extra decision upon failover, because a connection is already
established. This can be mitigated by using haproxy 1.5 and defining
the backend with on-marked-down shutdown-sessions, this will kill the
connections at the TCP level on the formerly active node. Or in case
of graceful failover, include killing connections in the failover
script on the formerly active node. The application in this case will
get error 1 or 2 you described.

>From your description error 1 and 2 are related to killing
connections. Case 1 (MySQL server has gone away) happens when the
connection was killed (but not at the MySQL protocol level) while it
was idle, and the application is attempting to re-use it. In this case
the correct behaviour would be re-establishing the connection. Error 2
is the same thing, but while the connection was actually doing
something, reconnecting and retrying is the correct behaviour. These
errors are not avoidable, if the node dies non-gracefully. A server
can for example lose power while doing the transaction, in this case
the transaction will be aborted, and the application will get one of
the errors described above. The application has to know that the data
is not written, since it didn't do commit or database didn't
acknowledge the commit.


On Sat, Sep 20, 2014 at 12:11 AM, Sławek Kapłoński <slawek at kaplonski.pl> wrote:
> Hello,
>
> New questions below
>
> ---
> Best regards
> Sławek Kapłoński
> slawek at kaplonski.pl
>
> Dnia czwartek, 18 września 2014 09:45:21 Clint Byrum pisze:
>> Excerpts from Sławek Kapłoński's message of 2014-09-18 09:29:27 -0700:
>> > Hello,
>> >
>> > Is anyone here using openstack with mysql galera and haproxy? Have You got
>> > any problems with that?
>> > I was today installed such ha infra for database (two mysql servers in
>> > galera cluster and haproxy on controller and neutron node, this haproxy
>> > is connecting to one of galera servers with round robin algorithm).
>> > Generally all is working fine but I have few problems:
>> > 1. I have a lot of messages like:
>> > WARNING neutron.openstack.common.db.sqlalchemy.session [-] Got mysql
>> > server
>> > has gone away: (2006, 'MySQL server has gone away')
>> > 2. I have (most on neutron) many errors like:
>> > OperationalError: (OperationalError) (2013, 'Lost connection to MySQL
>> > server during query') 'UPDATE ml2_port_bindings SET vif_type=%s,
>> > driver=%s, segment=%s WHERE ml2_port_bindings.port_id =
>>
>> 1 and 2 look like timeout issues. Check haproxy's timeouts. They need
>> to be just a little longer than MySQL's connection timeouts.
>
> After I made ACTIVE/PASSIVE cluster and change sql_idle_timeout in neutron and
> nova problem 1 looks that is solver. Unfortunatelly I found that when I'm
> deleting port from neutron I still sometimes have got errors like in 2. I
> don't check exactly nova logs yet so I'm not sure is it only in neutron or in
> both.
> Do You maybe know why it happens in neutron? It not happend when I have single
> mysql node without haproxy and galera so I suppose that haproxy or galera is
> responsible for that problem :/
>
>>
>> > 3. Also errors:
>> > StaleDataError: UPDATE statement on table 'ports' expected to update 1
>> > row(s); 0 were matched.
>> > 4. and errors:
>> > DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to get
>> > lock; try restarting transaction') 'UPDATE ipavailabilityranges SET
>> > first_ip=%s WHERE ipavailabilityranges.allocation_pool_id =
>>
>> 3 and 4 are a known issue. Our code doesn't always retry transactions,
>> which is required to use Galera ACTIVE/ACTIVE. Basically, that doesn't
>> work.
>>
>> You can use ACTIVE/PASSIVE, and even do vertical partitioning where
>> one of the servers is ACTIVE for Nova, but another one is ACTIVE for
>> Neutron. But AFAIK, ACTIVE/ACTIVE isn't being tested and the work hasn't
>> been done to make the concurrent transactions work properly.
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



-- 
Peter Boros, Principal Architect, Percona
Telephone: +1 888 401 3401 ext 546
Emergency: +1 888 401 3401 ext 911
Skype: percona.pboros



More information about the OpenStack-operators mailing list