[Openstack-operators] resizing an instance

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Wed May 22 19:20:55 UTC 2013


Hi,
do not confuse migration, resize, live-migration, evacuate, … :)
Rafi is talking about resize/migrate over shared storage.

cheers,
Belmiro


On May 22, 2013, at 8:11 PM, Juan José Pavlik Salles <jjpavlik at gmail.com> wrote:

> Good to know! So, whether i use shared storage or not, i won't be able to use live migration (a safe migration) with stable/grizzly (unless i apply the patch Rafi mentions)? 
> 
> 
> 2013/5/21 Joshua Harlow <harlowja at yahoo-inc.com>
> Thank Rafi,
> 
> That is a pretty sad state of things. Hoping things get better (and simpler) in havana. Resizing now scares me more than it did.
> 
> From: Rafi Khardalian <rafi at metacloud.com>
> Date: Tuesday, May 21, 2013 3:53 PM
> To: Joshua Harlow <harlowja at yahoo-inc.com>
> Cc: Ryan Lane <rlane at wikimedia.org>, Juan José Pavlik Salles <jjpavlik at gmail.com>, "openstack-operators at lists.openstack.org" <openstack-operators at lists.openstack.org>
> 
> Subject: Re: [Openstack-operators] resizing an instance
> 
> Unfortunately resizes are much worse than what's being described here.
> 
> Aside from the SSH, which we all agree has a number of associated problems, there are 3 separate image conversions which occur (assuming you are using raw backed qcow, which is the default).  The first conversion flattens the image, eliminating the backing file.  This occurs on the origin hypervisor, with the result being SCP'd to the destination.  Once complete, the second conversion kicks off, which converts from qcow2 to raw.  This is done in an attempt to resize the filesystem contained within the image.  This resize will fail unless you're using AMIs, as there is no logic in place to resize images containing partitions.  The third conversion occurs as the raw is converted back into qcow2.
> 
> Until recently, there was absolutely nothing taking shared storage into account.  In fact, if you are using shared storage, be absolutely certain to apply the patch associated with this bug (https://bugs.launchpad.net/nova/+bug/1177247).  Otherwise, there are a number of cases where *you will lose data*.  If you want to go a step further, you're welcome to apply my patch to make the entire process a lot more efficient (https://gist.github.com/rmk40/ab2c6f518a7a40a261af).  The patch copies the disk file around untouched and relies on code introduced in Grizzly to repopulate the backing files on the destination via Glance.
> 
> The good news is, we know how much work migrate/resize needs and are committed to fixing it in the Havana cycle.  I haven't submitted the aforementioned patch for this very reason, we're overhauling the code path entirely.  Nonetheless, Grizzly represents stable today, so if you're doing resizes on a regular basis, take a look at the patch.  It has zero chance of going into stable/grizzly, because of the constraints around how the stable tree is managed. 
> 
> - Rafi
> 
> 
> On Tue, May 21, 2013 at 3:24 PM, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
> Hi Ryan,
> 
> Yes it is a little bit weird,  I think the reasons its doing this is likely due to it being the most generic solution to the resizing problem. I believe there is a flag like 'resize_same_host' but I haven't used it that might help.
> 
> I think your concern is valid though and could/should(?) be fixed. 
> 
> You could imagine the scheduler 'preferring' the origin compute and the origin compute node in this case doing a 'fast path' resize path that would trigger the compute manager to just move some folders around (instead of invoking the ssh sequence you talked about). This would seem to make sense to me and would avoid the 'slow path' of actually moving the instance (and disks and so on) to a new node since the origin node doesn't have enough space/cpu/memory… That would make sense to me and likely with beefy enough compute nodes the 'fast path' would be the common case.
> 
> -Josh
> 
> From: Ryan Lane <rlane at wikimedia.org>
> Date: Tuesday, May 21, 2013 2:25 PM
> To: Juan José Pavlik Salles <jjpavlik at gmail.com>
> Cc: "openstack-operators at lists.openstack.org" <openstack-operators at lists.openstack.org>
> Subject: Re: [Openstack-operators] resizing an instance
> 
> On Tue, May 21, 2013 at 2:20 PM, Juan José Pavlik Salles <jjpavlik at gmail.com> wrote:
> I'm not sure about your deployment, but i noticed that when i try to resize a vm nova also tries to move the vm to another compute-node, so if you don't have a shared storage for the VMs this is not possible (unless your compute-node can ssh to the other compute-node ofcourse). I found in my logs thigs like "ssh root at node2 mkdir -p /var/lib/nova/instances/instances_dir" when trying to resize a VM. Tomorrow i'll trying resizing with shared storage and i'll let you know. 
> 
> 
> Yes. This is the weirdest behavior. Why in the world is it necessary to move the instance to another compute node just to do a resize? It requires ssh between instances, makes the process *much* slower and also makes it way more error prone. I don't get it. This seems like a really convoluted way of handling resizes.
> 
> Is there some really great reasoning behind this?
> 
> - Ryan
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 
> 
> 
> 
> 
> -- 
> Pavlik Salles Juan José
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




More information about the OpenStack-operators mailing list