<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 05/04/16 16:33, Daniel P. Berrange wrote:<br>
<blockquote cite="mid:20160405153328.GC5891@redhat.com" type="cite">
<pre wrap="">On Tue, Apr 05, 2016 at 05:17:41PM +0200, Luis Tomas wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi,
We are working on the possibility of including post-copy live migration into
Nova (<a class="moz-txt-link-freetext" href="https://review.openstack.org/#/c/301509/">https://review.openstack.org/#/c/301509/</a>)
At libvirt level, post-copy live migration works as follow:
- Start live migration with a post-copy enabler flag
(VIR_MIGRATE_POSTCOPY). Note this does not mean the migration is performed
in post-copy mode, just that you can switch it to post-copy at any given
time.
- Change the migration from pre-copy to post-copy mode.
However, we are not sure what's the most convenient way of providing this
functionality at Nova level.
The current specs, propose to include an optional flag at the live migration
API to include the VIR_MIGRATE_POSTCOPY flag when starting the live
migration. Then we propose a second API to actually switch the migration
from pre-copy to post-copy mode similarly to how it is done in LibVirt. This
is also similar to how the new "force-migrate" option works to ensure
migrations completion. In fact, this method could be an extension of the
force-migrate, by switching to postcopy if the migration was started with
the VIR_MIGRATE_POSTCOPY libvirt flag, or pause it otherwise.
The cons of this approach are that we expose a too specific mechanism
through the API. To alleviate this, we could remove the "switch" API, and
automatize the switch based on data transferred, available bandwidth or
other related metrics. However we will still need the extension to the
live-migration API to include the proper libvirt postcopy flag.
</pre>
</blockquote>
<pre wrap="">
No we absolutely don't want to expose that in the API as a concept, as it
is private technical implementation detail of the KVM migration code.
</pre>
<blockquote type="cite">
<pre wrap="">The other solution is to start all the migrations with the
VIR_MIGRATE_POSTCOPY mode, and therefore no new APIs would be needed. The
system could automatically detect the migration is taking too long (or is
dirting memory faster than the sending rate), and automatically switch to
post-copy.
</pre>
</blockquote>
<pre wrap="">
Yes this is what we should be doing as default behaviour with new enough
QEMU IMHO.
</pre>
<blockquote type="cite">
<pre wrap="">The cons of this is that including the VIR_MIGRATE_POSTCOPY flag has an
overhead, and it will not be desirable to included for all migrations,
specially is they can be nicely migrated with pre-copy mode. In addition, if
the migration fails after the switching, the VM will be lost. Therefore,
admins may want to ensure that post-copy is not used for some specific VMs.
</pre>
</blockquote>
<pre wrap="">
We shouldn't be trying to run before we can walk. Even if post-copy
is hurts some guests, it'll still be a net win overall because it will
give a guarantee that migration can complete without needing to stop
guest CPUs entirely. All we need to start with is a nova.conf setting
to let admin turn off use of post-copy for the host for cases where
we want to priortize performance over the ability to migrate successfully.
Any plan wrt changing migration behaviour on a per-VM basis needs to
consider a much broader set of features than just post-copy. For example,
compression, autoconverge and max-downtime settings all have an overhead
or impact on the guest too. We don't want to end up exposing API flags to
turn any of these on/off individually. So any solution to this will have
to look at a combination of usage context and some kind of SLA marker on
the guest. eg if the migration is in the context of host-evacuate which
absolutely must always complete in finite time, we should always use
post-copy. If the migration is in the context of load-balancing workloads
across hosts, then some aspect of guest SLA must inform whether Nova chooses
to use post-copy, or compression or auto-converge, etc.
Regards,
Daniel
</pre>
</blockquote>
<font face="Times New Roman, Times, serif"><font face="Helvetica,
Arial, sans-serif">We talked about the SLA issue at the mid
cycle. I seem to recall saying<br>
I'd propose a spec for Newton so I should probably get to that.<br>
<br>
The idea discussed then was to define instances as Cattle, Pets
and<br>
Pandas where cattle are expendable, Pets are less so and Pandas
are high<br>
value instances.<br>
<br>
I also believe we need to know how important the migration is.
For<br>
example if the operator is trying to empty a node due because
they are<br>
concerned it is likely to fail then they set the migration as a
high<br>
importance task. On the other hand if they are moving instances
as<br>
part of a monthly maintenance task they may be more relaxed about
the<br>
outcome. If the migration is part of a de-fragmentation
exercise the<br>
operator might be fine with some instances not being able to be
moved. <br>
<br>
So my suggestion is we have add a flag to the live-migration
operation<br>
to allow the operator to specify high, medium or low importance.
When<br>
the migration is in progress the compute manager can use this
setting<br>
in conjunction with the instance SLA to determine how aggressive
it <br>
should be in trying to get the migration completed. <br>
</font><br>
</font>
<pre class="moz-signature" cols="72">Paul Carlton
Software Engineer
Cloud Services
Hewlett Packard
BUK03:T242
Longdown Avenue
Stoke Gifford
Bristol BS34 8QZ
Mobile: +44 (0)7768 994283
Email: <a class="moz-txt-link-freetext" href="mailto:paul.carlton2@hpe.com">mailto:paul.carlton2@hpe.com</a></pre>
<br>
</body>
</html>