[openstack-dev] [kolla] add service discovery, proxysql, vault, fabio and FQDN endpoints

Jay Pipes jaypipes at gmail.com
Tue Oct 9 11:23:03 UTC 2018


On 10/09/2018 06:34 AM, Florian Engelmann wrote:
> Am 10/9/18 um 11:41 AM schrieb Jay Pipes:
>> On 10/09/2018 04:34 AM, Christian Berendt wrote:
>>>
>>>
>>>> On 8. Oct 2018, at 19:48, Jay Pipes <jaypipes at gmail.com> wrote:
>>>>
>>>> Why not send all read and all write traffic to a single haproxy 
>>>> endpoint and just have haproxy spread all traffic across each Galera 
>>>> node?
>>>>
>>>> Galera, after all, is multi-master synchronous replication... so it 
>>>> shouldn't matter which node in the Galera cluster you send traffic to.
>>>
>>> Probably because of MySQL deadlocks in Galera:
>>>
>>> —snip—
>>> Galera cluster has known limitations, one of them is that it uses 
>>> cluster-wide optimistic locking. This may cause some transactions to 
>>> rollback. With an increasing number of writeable masters, the 
>>> transaction rollback rate may increase, especially if there is write 
>>> contention on the same dataset. It is of course possible to retry the 
>>> transaction and perhaps it will COMMIT in the retries, but this will 
>>> add to the transaction latency. However, some designs are deadlock 
>>> prone, e.g sequence tables.
>>> —snap—
>>>
>>> Source: 
>>> https://severalnines.com/resources/tutorials/mysql-load-balancing-haproxy-tutorial 
>>>
>>
>> Have you seen the above in production?
> 
> Yes of course. Just depends on the application and how high the workload 
> gets.
> 
> Please read about deadloks and nova in the following report by Intel:
> 
> http://galeracluster.com/wp-content/uploads/2017/06/performance_analysis_and_tuning_in_china_mobiles_openstack_production_cloud_2.pdf 

I have read the above. It's a synthetic workload analysis, which is why 
I asked if you'd seen this in production.

For the record, we addressed much of the contention/races mentioned in 
the above around scheduler resource consumption in the Ocata and Pike 
releases of Nova.

I'm aware that the report above identifies the quota handling code in 
Nova as the primary culprit of the deadlock issues but again, it's a 
synthetic workload that is designed to find breaking points. It doesn't 
represent a realistic production workload.

You can read about the deadlock issue in depth on my blog here:

http://www.joinfu.com/2015/01/understanding-reservations-concurrency-locking-in-nova/

That explains where the source of the problem comes from (it's the use 
of SELECT FOR UPDATE, which has been removed from Nova's quota-handling 
code in the Rocky release).

> If just Nova is affected we could also create an additional HAProxy 
> listener using all Galera nodes with round-robin for all other services?

I fail to see the point of using Galera with a single writer. At that 
point, why bother with Galera at all? Just use a single database node 
with a single slave for backup purposes.

> Anyway - proxySQL would be a great extension.

I don't disagree that proxySQL is a good extension. However, it adds yet 
another services to the mesh that needs to be deployed, configured and 
maintained.

Best,
-jay



More information about the OpenStack-dev mailing list