[Openstack] nova-compute and cinder-scheduler HA

Сергей Мотовиловец motovilovets.sergey at gmail.com
Thu May 15 11:13:10 UTC 2014


K, here's the follow-up.
I suppose I was wrong saying that "InvalidBDM: Block Device Mapping is
Invalid" problem with multiple cinder-scheduler instances is caused by
deadlocks. Here's a thread regarding this problem, still without an
appropriate answer:
http://www.gossamer-threads.com/lists/openstack/dev/24998
The thread is about Folsom. Is it even possible that no one figured out a
solution since then?

BTW, I managed to make multiple nova-conductor instances running by binding
MySQL to "management" interfaces on controller nodes and then pointing
nova.conf to 127.0.0.1 where haproxy is listening on each node, forwarding
queries to these management IP's; but the second controller IP is
configured as backup, so all requests from both controllers go to the first
one while it is alive, if it fails haproxy switches to the second
controller.
This is quick&dirty way though, with 0 scalability, so the question is
still open.




2014-05-15 8:07 GMT+03:00 Chu Duc Minh <chu.ducminh at gmail.com>:

> I hava the same setup & got the same problem like Sergey.
> Not yet figured out a good solution for this (except Pacemaker/Corosync)
>
>
> On Thu, May 15, 2014 at 1:49 AM, Сергей Мотовиловец <
> motovilovets.sergey at gmail.com> wrote:
>
>>  Hello everyone!
>>
>> I'm facing some troubles with nova and cinder here.
>>
>> I have 2 control nodes (active/active) in my testing environment with
>> Percona XtraDB cluster (Galera+xtrabackup) + garbd on a separate node (to
>> avoid split-brain)  + OpenStack Icehouse, latest from Ubuntu 14.04 main
>> repo.
>>
>> The problem is horizontal scalability of nova-conductor and
>> cinder-scheduler services, seems like all active instances of these
>> services are trying to execute same MySQL queries they get from Rabbit,
>> which leads to numerous deadlocks in set-up with Galera.
>>
>> In case when multiple nova-conductor services are running (and using
>> MySQL instances on corresponding control nodes) it appears as "Deadlock
>> found when trying to get lock; try restarting transaction" in log.
>> With cinder-scheduler it leads to "InvalidBDM: Block Device Mapping is
>> Invalid."
>>
>>  Is there any possible way to make multiple instances of these services
>> running simultaneously and not duplicating queries?
>> (I don't really like the idea of handling this with Heartbeat+Pacemaker
>> or other similar stuff, mostly because I'm thinking about equal load
>> distribution across control nodes, but in this case it seems like it has an
>> opposite effect, multiplying load on MySQL)
>>
>> Another thing that is extremely annoying: if instance stuck in ERROR
>> state because of deadlock during its termination - it is impossible to
>> terminate instance anymore in Horizon, only via nova-api with reset-state.
>> How can this be handled?
>>
>> I'd really appreciate any help/advises/thoughts regarding these problems.
>>
>>
>> Best regards,
>> Motovilovets Sergey
>> Software Operation Engineer
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140515/81cc5209/attachment.html>


More information about the Openstack mailing list