[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration
Sam Morrison
sorrison at gmail.com
Thu Aug 23 02:14:28 UTC 2018
I think in our case we’d only migrate between cells if we know the network and storage is accessible and would never do it if not.
Thinking moving from old to new hardware at a cell level.
If storage and network isn’t available ideally it would fail at the api request.
There is also ceph backed instances and so this is also something to take into account which nova would be responsible for.
I’ll be in Denver so we can discuss more there too.
Cheers,
Sam
> On 23 Aug 2018, at 11:23 am, Matt Riedemann <mriedemos at gmail.com> wrote:
>
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The main issue in there right now is dealing with cross-cell cold migration in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would like to move users off the old flavors/hardware (old cell) to new flavors in a new cell.
>
> * There is network isolation between compute hosts in different cells, so no ssh'ing the disk around like we do today. But the image service is global to all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I am proposing that we leverage something like shelve offload/unshelve masquerading as resize. We shelve offload from the source cell and unshelve in the target cell. This should work for both volume-backed and non-volume-backed servers (we use snapshots for shelved offloaded non-volume-backed servers).
>
> There are, of course, some complications. The main ones that I need help with right now are what happens with volumes and ports attached to the server. Today we detach from the source and attach at the target, but that's assuming the storage backend and network are available to both hosts involved in the move of the server. Will that be the case across cells? I am assuming that depends on the network topology (are routed networks being used?) and storage backend (routed storage?). If the network and/or storage backend are not available across cells, how do we migrate volumes and ports? Cinder has a volume migrate API for admins but I do not know how nova would know the proper affinity per-cell to migrate the volume to the proper host (cinder does not have a routed storage concept like routed provider networks in neutron, correct?). And as far as I know, there is no such thing as port migration in Neutron.
>
> Could Placement help with the volume/port migration stuff? Neutron routed provider networks rely on placement aggregates to schedule the VM to a compute host in the same network segment as the port used to create the VM, however, if that segment does not span cells we are kind of stuck, correct?
>
> To summarize the issues as I see them (today):
>
> * How to deal with the targeted cell during scheduling? This is so we can even get out of the source cell in nova.
>
> * How does the API deal with the same instance being in two DBs at the same time during the move?
>
> * How to handle revert resize?
>
> * How are volumes and ports handled?
>
> I can get feedback from my company's operators based on what their deployment will look like for this, but that does not mean it will work for others, so I need as much feedback from operators, especially those running with multiple cells today, as possible. Thanks in advance.
>
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
>
> --
>
> Thanks,
>
> Matt
More information about the OpenStack-operators
mailing list