[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration
Matt Riedemann
mriedemos at gmail.com
Wed Nov 7 00:35:04 UTC 2018
After hacking on the PoC for awhile [1] I have finally pushed up a spec
[2]. Behold it in all its dark glory!
[1] https://review.openstack.org/#/c/603930/
[2] https://review.openstack.org/#/c/616037/
On 8/22/2018 8:23 PM, Matt Riedemann wrote:
> Hi everyone,
>
> I have started an etherpad for cells topics at the Stein PTG [1]. The
> main issue in there right now is dealing with cross-cell cold migration
> in nova.
>
> At a high level, I am going off these requirements:
>
> * Cells can shard across flavors (and hardware type) so operators would
> like to move users off the old flavors/hardware (old cell) to new
> flavors in a new cell.
>
> * There is network isolation between compute hosts in different cells,
> so no ssh'ing the disk around like we do today. But the image service is
> global to all cells.
>
> Based on this, for the initial support for cross-cell cold migration, I
> am proposing that we leverage something like shelve offload/unshelve
> masquerading as resize. We shelve offload from the source cell and
> unshelve in the target cell. This should work for both volume-backed and
> non-volume-backed servers (we use snapshots for shelved offloaded
> non-volume-backed servers).
>
> There are, of course, some complications. The main ones that I need help
> with right now are what happens with volumes and ports attached to the
> server. Today we detach from the source and attach at the target, but
> that's assuming the storage backend and network are available to both
> hosts involved in the move of the server. Will that be the case across
> cells? I am assuming that depends on the network topology (are routed
> networks being used?) and storage backend (routed storage?). If the
> network and/or storage backend are not available across cells, how do we
> migrate volumes and ports? Cinder has a volume migrate API for admins
> but I do not know how nova would know the proper affinity per-cell to
> migrate the volume to the proper host (cinder does not have a routed
> storage concept like routed provider networks in neutron, correct?). And
> as far as I know, there is no such thing as port migration in Neutron.
>
> Could Placement help with the volume/port migration stuff? Neutron
> routed provider networks rely on placement aggregates to schedule the VM
> to a compute host in the same network segment as the port used to create
> the VM, however, if that segment does not span cells we are kind of
> stuck, correct?
>
> To summarize the issues as I see them (today):
>
> * How to deal with the targeted cell during scheduling? This is so we
> can even get out of the source cell in nova.
>
> * How does the API deal with the same instance being in two DBs at the
> same time during the move?
>
> * How to handle revert resize?
>
> * How are volumes and ports handled?
>
> I can get feedback from my company's operators based on what their
> deployment will look like for this, but that does not mean it will work
> for others, so I need as much feedback from operators, especially those
> running with multiple cells today, as possible. Thanks in advance.
>
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
>
--
Thanks,
Matt
More information about the OpenStack-operators
mailing list