[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

Jay Pipes jaypipes at gmail.com
Wed Aug 29 14:22:16 UTC 2018


Sorry for delayed response. Was on PTO when this came out. Comments 
inline...

On 08/22/2018 09:23 PM, Matt Riedemann wrote:
> Hi everyone,
> 
> I have started an etherpad for cells topics at the Stein PTG [1]. The 
> main issue in there right now is dealing with cross-cell cold migration 
> in nova.
> 
> At a high level, I am going off these requirements:
> 
> * Cells can shard across flavors (and hardware type) so operators would 
> like to move users off the old flavors/hardware (old cell) to new 
> flavors in a new cell.

So cell migrations are kind of the new release upgrade dance. Got it.

> * There is network isolation between compute hosts in different cells, 
> so no ssh'ing the disk around like we do today. But the image service is 
> global to all cells.
> 
> Based on this, for the initial support for cross-cell cold migration, I 
> am proposing that we leverage something like shelve offload/unshelve 
> masquerading as resize. We shelve offload from the source cell and 
> unshelve in the target cell. This should work for both volume-backed and 
> non-volume-backed servers (we use snapshots for shelved offloaded 
> non-volume-backed servers).

shelve was and continues to be a hack in order for users to keep an IPv4 
address while not consuming compute resources for some amount of time. [1]

If cross-cell cold migration is similarly just about the user being able 
to keep their instance's IPv4 address while allowing an admin to move an 
instance's storage to another physical location, then my firm belief is 
that this kind of activity needs to be coordinated *externally to Nova*.

Each deployment is going to be different, and in all cases of cross-cell 
migration, the admins doing these move operations are going to need to 
understand various network, storage and failure domains that are 
particular to that deployment (and not something we have the ability to 
discover in any automated fashion).

Since we're not talking about live migration (thank all that is holy), I 
believe the safest and most effective way to perform such a cross-cell 
"migration" would be the following basic steps:

0. ensure that each compute node is associated with at least one nova 
host aggregate that is *only* in a single cell
1. shut down the instance (optionally snapshotting required local disk 
changes if the user is unfortunately using their root disk for 
application data)
2. "save" the instance's IP address by manually creating a port in 
Neutron and assigning the IP address manually to that port. this of 
course will be deployment-dependent since you will need to hope the 
saved IP address for the migrating instance is in a subnet range that is 
available in the target cell
3. migrate the volume manually. this will be entirely deployment and 
backend-dependent as smcginnis alluded to in a response to this thread
4. have the admin boot the instance in a host aggregate that is known to 
be in the target cell, passing --network port_id=$SAVED_PORT_WITH_IP and 
--volume $MIGRATED_VOLUME_UUID arguments as needed. the admin would need 
to do this because users don't know about host aggregates and, frankly, 
the user shouldn't know about host aggregates, cells, or any of this.

Best,
-jay

[1] ok, shelve also lets a user keep their instance ID. I don't care 
much about that.



More information about the OpenStack-operators mailing list