[Openstack-operators] About Live Migration and Bests practises.

David Bayle dbayle.mon at globo.tech
Fri May 27 18:19:17 UTC 2016


Greetings,

First thanks a lot for all the information provided regarding OpenStack 
and thanks for your hudge work on this topic. (Live Migration)

We are operating an OpenStack setup running Kilo + Ceph + 
(ceph_patch_for_snapshot).

We are still playing with live migration on Kilo and we had some 
questions about it:

- first, when we ask from a live migrate from compute1 to Compute2, does 
it takes the exact same amount of RAM from compute1 and reserve it on 
compute2 ? or is there any little overhead ?
- and for the second question, does the state NOSTATE of openstack, 
reveals that the migration state of KVM has been lost ? (kvm power state 
for example) or does it reveals that there was an issue copying the RAM 
of the instance from one compute to the other one.

We faced some issues while trying to host-live-evacuate, or even if we 
do live migrate more than 4 to 5 instances at the same time, most of the 
time the live migration brakes and VMs get NOSTATE for power state in 
OpenStack: which is very disturbing because the only way to solve this 
(the only way that we know) is to restart the instance.
(we could also edit the mysql database as proposed by the IRC chan of 
the community).
By live migrating each instances one by one => gives no issue.
More than this can result in live migration failure and NOSTATE in 
openstack power status.

Is there anything that we are doing wrong ? we've seen 
host-live-evacuate working once or two times; but then when having 
around 15 VMs on a compute; the behavior is totally different.
(and it doesn t seems we are maxing out any resources (but the network 
can be as we are using 1Gb/s management network)

Here is an example of issue faced with a host-live-evacuate we get this 
on the source comput node:

2016-05-26 16:49:54.080 3963 WARNING nova.virt.libvirt.driver [-] 
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] Error monitoring 
migration: internal error: received hangup / error event on socket
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de] Traceback (most recent call last): 
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 
5689, in _live_migration 2016-05-26 16:49:54.080 3963 TRACE 
nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]     dom, finish_event) 2016-05-26 
16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 
5521, in _live_migration_monitor 2016-05-26 16:49:54.080 3963 TRACE 
nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]     info = 
host.DomainJobInfo.for_domain(dom) 2016-05-26 16:49:54.080 3963 TRACE 
nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/host.py", line 157, 
in for_domain 2016-05-26 16:49:54.080 3963 TRACE 
nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]     stats = dom.jobStats() 
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit 
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]     result = 
proxy_call(self._autowrap, f, *args, **kwargs) 2016-05-26 16:49:54.080 
3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in 
proxy_call 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver 
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de]     rv = execute(f, 
*args, **kwargs) 2016-05-26 16:49:54.080 3963 TRACE 
nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in 
execute 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver 
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de]     six.reraise(c, e, 
tb) 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver 
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in 
tworker 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver 
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de]     rv = meth(*args, 
**kwargs) 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver 
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de]   File 
"/usr/lib/python2.7/dist-packages/libvirt.py", line 1133, in jobStats 
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de]     if ret is None: raise 
libvirtError ('virDomainGetJobStats() failed', dom=self) 2016-05-26 
16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de] libvirtError: internal error: 
received hangup / error event on socket 2016-05-26 16:49:54.080 3963 
TRACE nova.virt.libvirt.driver [instance: 
98b793a1-61fb-45c6-95b7-6c2bca10d6de] The first Instance was successfull 
but then all other crashed and went NOSTATE. Again thank you for your 
help. Best regards, David.

-- 
David Bayle
System Administrator
GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
support at globo.tech
http://www.globo.tech

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160527/a59a8769/attachment.html>


More information about the OpenStack-operators mailing list