[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration

Matt Riedemann mriedemos at gmail.com
Wed Nov 7 00:35:04 UTC 2018

After hacking on the PoC for awhile [1] I have finally pushed up a spec 
[2]. Behold it in all its dark glory!

[1] https://review.openstack.org/#/c/603930/
[2] https://review.openstack.org/#/c/616037/

On 8/22/2018 8:23 PM, Matt Riedemann wrote:
> Hi everyone,
> I have started an etherpad for cells topics at the Stein PTG [1]. The 
> main issue in there right now is dealing with cross-cell cold migration 
> in nova.
> At a high level, I am going off these requirements:
> * Cells can shard across flavors (and hardware type) so operators would 
> like to move users off the old flavors/hardware (old cell) to new 
> flavors in a new cell.
> * There is network isolation between compute hosts in different cells, 
> so no ssh'ing the disk around like we do today. But the image service is 
> global to all cells.
> Based on this, for the initial support for cross-cell cold migration, I 
> am proposing that we leverage something like shelve offload/unshelve 
> masquerading as resize. We shelve offload from the source cell and 
> unshelve in the target cell. This should work for both volume-backed and 
> non-volume-backed servers (we use snapshots for shelved offloaded 
> non-volume-backed servers).
> There are, of course, some complications. The main ones that I need help 
> with right now are what happens with volumes and ports attached to the 
> server. Today we detach from the source and attach at the target, but 
> that's assuming the storage backend and network are available to both 
> hosts involved in the move of the server. Will that be the case across 
> cells? I am assuming that depends on the network topology (are routed 
> networks being used?) and storage backend (routed storage?). If the 
> network and/or storage backend are not available across cells, how do we 
> migrate volumes and ports? Cinder has a volume migrate API for admins 
> but I do not know how nova would know the proper affinity per-cell to 
> migrate the volume to the proper host (cinder does not have a routed 
> storage concept like routed provider networks in neutron, correct?). And 
> as far as I know, there is no such thing as port migration in Neutron.
> Could Placement help with the volume/port migration stuff? Neutron 
> routed provider networks rely on placement aggregates to schedule the VM 
> to a compute host in the same network segment as the port used to create 
> the VM, however, if that segment does not span cells we are kind of 
> stuck, correct?
> To summarize the issues as I see them (today):
> * How to deal with the targeted cell during scheduling? This is so we 
> can even get out of the source cell in nova.
> * How does the API deal with the same instance being in two DBs at the 
> same time during the move?
> * How to handle revert resize?
> * How are volumes and ports handled?
> I can get feedback from my company's operators based on what their 
> deployment will look like for this, but that does not mean it will work 
> for others, so I need as much feedback from operators, especially those 
> running with multiple cells today, as possible. Thanks in advance.
> [1] https://etherpad.openstack.org/p/nova-ptg-stein-cells




