<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 05/04/16 16:33, Daniel P. Berrange wrote:<br>

    <blockquote cite="mid:20160405153328.GC5891@redhat.com" type="cite">

      <pre wrap="">On Tue, Apr 05, 2016 at 05:17:41PM +0200, Luis Tomas wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">Hi,

We are working on the possibility of including post-copy live migration into

Nova (<a class="moz-txt-link-freetext" href="https://review.openstack.org/#/c/301509/">https://review.openstack.org/#/c/301509/</a>)

At libvirt level, post-copy live migration works as follow:

    - Start live migration with a post-copy enabler flag

(VIR_MIGRATE_POSTCOPY). Note this does not mean the migration is performed

in post-copy mode, just that you can switch it to post-copy at any given

time.

    - Change the migration from pre-copy to post-copy mode.

However, we are not sure what's the most convenient way of providing this

functionality at Nova level.

The current specs, propose to include an optional flag at the live migration

API to include the VIR_MIGRATE_POSTCOPY flag when starting the live

migration. Then we propose a second API to actually switch the migration

from pre-copy to post-copy mode similarly to how it is done in LibVirt. This

is also similar to how the new "force-migrate" option works to ensure

migrations completion. In fact, this method could be an extension of the

force-migrate, by switching to postcopy if the migration was started with

the VIR_MIGRATE_POSTCOPY libvirt flag, or pause it otherwise.

The cons of this approach are that we expose a too specific mechanism

through the API. To alleviate this, we could remove the "switch" API, and

automatize the switch based on data transferred, available bandwidth or

other related metrics. However we will still need the extension to the

live-migration API to include the proper libvirt postcopy flag.

</pre>

      </blockquote>

      <pre wrap="">

No we absolutely don't want to expose that in the API as a concept, as it

is private technical implementation detail of the KVM migration code.

</pre>

      <blockquote type="cite">

        <pre wrap="">The other solution is to start all the migrations with the

VIR_MIGRATE_POSTCOPY mode, and therefore no new APIs would be needed. The

system could automatically detect the migration is taking too long (or is

dirting memory faster than the sending rate), and automatically switch to

post-copy.

</pre>

      </blockquote>

      <pre wrap="">

Yes this is what we should be doing as default behaviour with new enough

QEMU IMHO.

</pre>

      <blockquote type="cite">

        <pre wrap="">The cons of this is that including the VIR_MIGRATE_POSTCOPY flag has an

overhead, and it will not be desirable to included for all migrations,

specially is they can be nicely migrated with pre-copy mode. In addition, if

the migration fails after the switching, the VM will be lost. Therefore,

admins may want to ensure that post-copy is not used for some specific VMs.

</pre>

      </blockquote>

      <pre wrap="">

We shouldn't be trying to run before we can walk. Even if post-copy

is hurts some guests, it'll still be a net win overall because it will

give a guarantee that migration can complete without needing to stop

guest CPUs entirely. All we need to start with is a nova.conf setting

to let admin turn off use of post-copy for the host for cases where

we want to priortize performance over the ability to migrate successfully.

Any plan wrt changing migration behaviour on a per-VM basis needs to

consider a much broader set of features than just post-copy. For example,

compression, autoconverge and max-downtime settings all have an overhead

or impact on the guest too. We don't want to end up exposing API flags to

turn any of these on/off individually. So any solution to this will have

to look at a combination of usage context and some kind of SLA marker on

the guest. eg if the migration is in the context of host-evacuate which

absolutely must always complete in finite time, we should always use

post-copy. If the migration is in the context of load-balancing workloads

across hosts, then some aspect of guest SLA must inform whether Nova chooses

to use post-copy, or compression or auto-converge, etc.

Regards,

Daniel

</pre>

    </blockquote>

    <font face="Times New Roman, Times, serif"><font face="Helvetica,

        Arial, sans-serif">We talked about the SLA issue at the mid

        cycle.  I seem to recall saying<br>

        I'd propose a spec for Newton so I should probably get to that.<br>

        <br>

        The idea discussed then was to define instances as Cattle, Pets

        and<br>

        Pandas where cattle are expendable, Pets are less so and Pandas

        are high<br>

        value instances.<br>

        <br>

        I also believe we need to know how important the migration is.

        For<br>

        example if the operator is trying to empty a node due because

        they are<br>

        concerned it is likely to fail then they set the migration as a

        high<br>

        importance task.  On the other hand if they are moving instances

        as<br>

        part of a monthly maintenance task they may be more relaxed about

        the<br>

        outcome.  If the migration is part of a de-fragmentation

        exercise the<br>

        operator might be fine with some instances not being able to be

        moved. <br>

        <br>

        So my suggestion is we have add a flag to the live-migration

        operation<br>

        to allow the operator to specify high, medium or low importance. 

        When<br>

        the migration is in progress the compute manager can use this

        setting<br>

        in conjunction with the instance SLA to determine how aggressive

        it <br>

        should be in trying to get the migration completed.  <br>

      </font><br>

    </font>

    <pre class="moz-signature" cols="72">Paul Carlton

Software Engineer

Cloud Services 

Hewlett Packard

BUK03:T242

Longdown Avenue

Stoke Gifford

Bristol BS34 8QZ

Mobile:    +44 (0)7768 994283

Email:    <a class="moz-txt-link-freetext" href="mailto:paul.carlton2@hpe.com">mailto:paul.carlton2@hpe.com</a></pre>

    <br>

  </body>

</html>