[Openstack-operators] [nova][cinder][neutron] Cross-cell cold migration
Matt Riedemann
mriedemos at gmail.com
Thu Aug 23 01:23:41 UTC 2018
Hi everyone,
I have started an etherpad for cells topics at the Stein PTG [1]. The
main issue in there right now is dealing with cross-cell cold migration
in nova.
At a high level, I am going off these requirements:
* Cells can shard across flavors (and hardware type) so operators would
like to move users off the old flavors/hardware (old cell) to new
flavors in a new cell.
* There is network isolation between compute hosts in different cells,
so no ssh'ing the disk around like we do today. But the image service is
global to all cells.
Based on this, for the initial support for cross-cell cold migration, I
am proposing that we leverage something like shelve offload/unshelve
masquerading as resize. We shelve offload from the source cell and
unshelve in the target cell. This should work for both volume-backed and
non-volume-backed servers (we use snapshots for shelved offloaded
non-volume-backed servers).
There are, of course, some complications. The main ones that I need help
with right now are what happens with volumes and ports attached to the
server. Today we detach from the source and attach at the target, but
that's assuming the storage backend and network are available to both
hosts involved in the move of the server. Will that be the case across
cells? I am assuming that depends on the network topology (are routed
networks being used?) and storage backend (routed storage?). If the
network and/or storage backend are not available across cells, how do we
migrate volumes and ports? Cinder has a volume migrate API for admins
but I do not know how nova would know the proper affinity per-cell to
migrate the volume to the proper host (cinder does not have a routed
storage concept like routed provider networks in neutron, correct?). And
as far as I know, there is no such thing as port migration in Neutron.
Could Placement help with the volume/port migration stuff? Neutron
routed provider networks rely on placement aggregates to schedule the VM
to a compute host in the same network segment as the port used to create
the VM, however, if that segment does not span cells we are kind of
stuck, correct?
To summarize the issues as I see them (today):
* How to deal with the targeted cell during scheduling? This is so we
can even get out of the source cell in nova.
* How does the API deal with the same instance being in two DBs at the
same time during the move?
* How to handle revert resize?
* How are volumes and ports handled?
I can get feedback from my company's operators based on what their
deployment will look like for this, but that does not mean it will work
for others, so I need as much feedback from operators, especially those
running with multiple cells today, as possible. Thanks in advance.
[1] https://etherpad.openstack.org/p/nova-ptg-stein-cells
--
Thanks,
Matt
More information about the OpenStack-operators
mailing list