[openstack-dev] [Nova] Migration state machine proposal.

Tang Chen tangchen at cn.fujitsu.com
Tue Oct 27 08:27:13 UTC 2015


Hi Jay, Timofei,

Thank you for the info.

On 10/27/2015 08:02 AM, Jay Pipes wrote:
> On 10/22/2015 11:13 AM, Tang Chen wrote:
>> On 10/22/2015 05:17 AM, Joshua Harlow wrote:
>>> Overall I'm very much inclined to have three state machines (one
>>> for each type), vs the mix-mash of all three into one state machine
>>> (which causes the confusion around states in the first diagram in
>>> that paste).
>>
>> That is an idea. But I would prefer to have one single state machine
>> for migration, because resize and evacuate are reusing migration.
>> They can be in one state machine.
>
> Evacuate does *not* migrate/move anything. Evacuate *rebuilds* VMs 
> from their original source image.

Well, I just dug into the source code. I think there could be some 
difference between evacuate in nova server side and client side. In nova 
compute, the evacuate API does call rebuild process as you said. But in 
novaclient, there is a command "nova host-evacuate-live", which will 
live-migrate all running VMs, which made me believe that evacuate also 
migrates VMs. Please refer to:

https://github.com/openstack/python-novaclient/blob/master/novaclient/v2/contrib/host_evacuate_live.py#L72

I think this is also a reason why I always got confused in all these 
concepts: cold-migrate, evacuate, evacuate-live, rebuild, resize.


About the migration type, I can see that Timofei has tried to split 
live-migration into 3 types:
1. block_live_migrate
2. live_migrate_file_level_storage
3. live_migrate_block_stroage

I think it is in driver level, not the user level. It is based on the 
type of the storage the VM is using. And I think migration type should 
be a multi-level thing.

Since I'm still a little confused with all the types of migration, I'd 
like to share some of my understanding and if they are correct, I think 
we can improve it like this.

1. OpenStack is now supporting resize a VM to another compute node. If 
we set "allow_resize_to_same-host", it also supports local resize. If we 
are not using memory/CPU hotplug, resize will result in a shutdown and 
reconfiguration of VM.
So, there should be 2 types of resize: live (using hotplug) and cold 
(often resizing the primary disk).

2. Evacuate also has 2 types: live (equals to live-migrate) and cold 
(rebuild). But evacuate itself does nothing, I mean there is no actual 
process called evacuate. evacuate() is just an API calling 
rebuild_instance().

This is from the user level.

So finally, the migration type would be like this:

       user compute                                    driver

   live-migrate
   live-evacuate                     live-migrate
   live-resize                  memory/CPU hotplug

   cold-migrate           storage type, etc
   clod-evacuate                   cold-migrate
   cold-resize                      (to self or not)

     rebuild                               rebuild
                                   (this is not a migration)

I mean maybe we should handle different things in different levels. In 
compute, if the flow is too complex, we can define some more helper 
functions to make the main flow easier to understand.

>
> I support Nikola in that I believe the different migration types 
> should have different state machines entirely (but be as consistent as 
> possible in the naming of terminal states like "finished" vs "done" etc)

OK. Agreed. And maybe also introduce state machines for task_state and 
vm_state.

>
>> It would be very helpful if the designer of the migration process
>> could share his idea. But if it is just some code modified by many
>> people many times, I think we should remove the confusing states and
>> give a easier, better state machine.
>
> There isn't a designer of the migration process :( The original (crap, 
> IMHO) API from Rackspace Cloud Servers API was used for the resize 
> functionality in the compute API and it's been a source of confusion 
> and frustration ever since. Relying on a manual confirmation or revert 
> input from the user was and continues to be a horrible idea.

Agreed.

>
> I believe strongly that we should deprecate the existing migrate, 
> resize, an live-migrate APIs in favor of a single consolidated, 
> consistent "move" REST API that would have the following characteristics:
>
> * No manual or wait-input states in any FSM graph

Yes.

> * Removal of the term "resize" from the API entirely (the target 
> resource sizing is an attribute of the move operation, not a different 
> type of API operation in and of itself)

Maybe we can define it in a different level, as I said above. Not sure.

> * Transition to a task-based API for poll-state requests. This means 
> that in order for a caller to determine the state of a VM the caller 
> would call something like GET /servers/<UUID>/tasks/<UUID> in order to 
> see the history of state changes or subtask operations for a 
> particular request to move a VM

Yes.

>
> Timofei Durakov (cc'd) has a blueprint for splitting the 
> live-migration types into separate task classes here:
>
> https://review.openstack.org/#/c/225910/
>
> I think there's a lot of good ideas in that proposal. Please do have a 
> look at it.

Thanks very much.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151027/7d6db4b2/attachment.html>


More information about the OpenStack-dev mailing list