[Openstack-operators] About Live Migration and Bests practises.
David Bayle
dbayle.mon at globo.tech
Fri May 27 18:19:17 UTC 2016
Greetings,
First thanks a lot for all the information provided regarding OpenStack
and thanks for your hudge work on this topic. (Live Migration)
We are operating an OpenStack setup running Kilo + Ceph +
(ceph_patch_for_snapshot).
We are still playing with live migration on Kilo and we had some
questions about it:
- first, when we ask from a live migrate from compute1 to Compute2, does
it takes the exact same amount of RAM from compute1 and reserve it on
compute2 ? or is there any little overhead ?
- and for the second question, does the state NOSTATE of openstack,
reveals that the migration state of KVM has been lost ? (kvm power state
for example) or does it reveals that there was an issue copying the RAM
of the instance from one compute to the other one.
We faced some issues while trying to host-live-evacuate, or even if we
do live migrate more than 4 to 5 instances at the same time, most of the
time the live migration brakes and VMs get NOSTATE for power state in
OpenStack: which is very disturbing because the only way to solve this
(the only way that we know) is to restart the instance.
(we could also edit the mysql database as proposed by the IRC chan of
the community).
By live migrating each instances one by one => gives no issue.
More than this can result in live migration failure and NOSTATE in
openstack power status.
Is there anything that we are doing wrong ? we've seen
host-live-evacuate working once or two times; but then when having
around 15 VMs on a compute; the behavior is totally different.
(and it doesn t seems we are maxing out any resources (but the network
can be as we are using 1Gb/s management network)
Here is an example of issue faced with a host-live-evacuate we get this
on the source comput node:
2016-05-26 16:49:54.080 3963 WARNING nova.virt.libvirt.driver [-]
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] Error monitoring
migration: internal error: received hangup / error event on socket
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] Traceback (most recent call last):
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line
5689, in _live_migration 2016-05-26 16:49:54.080 3963 TRACE
nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] dom, finish_event) 2016-05-26
16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line
5521, in _live_migration_monitor 2016-05-26 16:49:54.080 3963 TRACE
nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] info =
host.DomainJobInfo.for_domain(dom) 2016-05-26 16:49:54.080 3963 TRACE
nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/nova/virt/libvirt/host.py", line 157,
in for_domain 2016-05-26 16:49:54.080 3963 TRACE
nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] stats = dom.jobStats()
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] result =
proxy_call(self._autowrap, f, *args, **kwargs) 2016-05-26 16:49:54.080
3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in
proxy_call 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] rv = execute(f,
*args, **kwargs) 2016-05-26 16:49:54.080 3963 TRACE
nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in
execute 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] six.reraise(c, e,
tb) 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in
tworker 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] rv = meth(*args,
**kwargs) 2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver
[instance: 98b793a1-61fb-45c6-95b7-6c2bca10d6de] File
"/usr/lib/python2.7/dist-packages/libvirt.py", line 1133, in jobStats
2016-05-26 16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] if ret is None: raise
libvirtError ('virDomainGetJobStats() failed', dom=self) 2016-05-26
16:49:54.080 3963 TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] libvirtError: internal error:
received hangup / error event on socket 2016-05-26 16:49:54.080 3963
TRACE nova.virt.libvirt.driver [instance:
98b793a1-61fb-45c6-95b7-6c2bca10d6de] The first Instance was successfull
but then all other crashed and went NOSTATE. Again thank you for your
help. Best regards, David.
--
David Bayle
System Administrator
GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
support at globo.tech
http://www.globo.tech
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20160527/a59a8769/attachment.html>
More information about the OpenStack-operators
mailing list