[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration
Jay Pipes
jaypipes at gmail.com
Wed Aug 29 14:22:16 UTC 2018
Sorry for delayed response. Was on PTO when this came out. Comments
inline...
On 08/22/2018 09:23 PM, Matt Riedemann wrote:
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The
> main issue in there right now is dealing with cross-cell cold migration
> in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would
> like to move users off the old flavors/hardware (old cell) to new
> flavors in a new cell.
So cell migrations are kind of the new release upgrade dance. Got it.
> * There is network isolation between compute hosts in different cells,
> so no ssh'ing the disk around like we do today. But the image service is
> global to all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I
> am proposing that we leverage something like shelve offload/unshelve
> masquerading as resize. We shelve offload from the source cell and
> unshelve in the target cell. This should work for both volume-backed and
> non-volume-backed servers (we use snapshots for shelved offloaded
> non-volume-backed servers).
shelve was and continues to be a hack in order for users to keep an IPv4
address while not consuming compute resources for some amount of time. [1]
If cross-cell cold migration is similarly just about the user being able
to keep their instance's IPv4 address while allowing an admin to move an
instance's storage to another physical location, then my firm belief is
that this kind of activity needs to be coordinated *externally to Nova*.
Each deployment is going to be different, and in all cases of cross-cell
migration, the admins doing these move operations are going to need to
understand various network, storage and failure domains that are
particular to that deployment (and not something we have the ability to
discover in any automated fashion).
Since we're not talking about live migration (thank all that is holy), I
believe the safest and most effective way to perform such a cross-cell
"migration" would be the following basic steps:
0. ensure that each compute node is associated with at least one nova
host aggregate that is *only* in a single cell
1. shut down the instance (optionally snapshotting required local disk
changes if the user is unfortunately using their root disk for
application data)
2. "save" the instance's IP address by manually creating a port in
Neutron and assigning the IP address manually to that port. this of
course will be deployment-dependent since you will need to hope the
saved IP address for the migrating instance is in a subnet range that is
available in the target cell
3. migrate the volume manually. this will be entirely deployment and
backend-dependent as smcginnis alluded to in a response to this thread
4. have the admin boot the instance in a host aggregate that is known to
be in the target cell, passing --network port_id=$SAVED_PORT_WITH_IP and
--volume $MIGRATED_VOLUME_UUID arguments as needed. the admin would need
to do this because users don't know about host aggregates and, frankly,
the user shouldn't know about host aggregates, cells, or any of this.
Best,
-jay
[1] ok, shelve also lets a user keep their instance ID. I don't care
much about that.
More information about the OpenStack-operators
mailing list