Open Stack

Fri Aug 23 14:36:43 UTC 2019

On 8/23/2019 3:43 AM, Alex Xu wrote:
> 
> 2. Supporting same host cold migration
> 
> This is the point I said below.
> For the same host resize, instance_uuid + flavor_id is working very 
> well. But it can't support the same host cold migration. And yes, 
> migration allocation can fix it. But also as you said, do we need to 
> support the same host cold migration?

I see no reason to try and bend over backward to support same host cold 
migration since, as I said, the only virt driver that supports that 
today (and has been the only one for a long time - maybe forever?) is 
the vmware driver which isn't supporting any of these more advanced 
flows (VGPU, VPMEM, PCPU).

> 
> If the answer is no, then we needn't bother it. instance_uuid + 
> flavor_id is much simple. If the answer is yes, right, we can put it 
> into the RT. But it will be complex, maybe we need a data model like the 
> DB way proposal to pass the virt driver/platform specific info between 
> RT and virt driver. Also, think about the case, we need to check if 
> there is any incomplete live-migration, we need to do a cleanup for all 
> free vpmems, since we lost the allocation info for live-migration. Then 
> we need a virt dirver interface to trigger that cleanup, pretty sure I 
> don't want to call it as driver.cleanup_vpmems(). We also need to change 
> the existing driver.spawn method, to pass the assigned resource into 
> virt driver. Also thinking about the case of interrupted migration, I 
> guess there is no way to switch the
> 
> I also remember Dan said, it isn't good to not support same host cold 
> migration.

Again, the libvirt driver, as far as I know, has never supported same 
host cold migration, nor is anyone working on that, so I don't see where 
the need to make that support happen now is coming from. I think it 
should be ignored for the sake of these conversations.

> 
> I think you are right, we can use RT.tracked_instances and 
> RT.tracked_migrations. Then it isn't on the fly anymore.
> There are two existing bugs should be fixed.
> 
> 1. The orphaned instance isn't in RT.tracked_instance. Although there is 
> resource consuming for orphaned instance 
> https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/compute/resource_tracker.py#L771, 
> the virt driver interface 
> https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/compute/resource_tracker.py#L1447 doesn't 
> implement by most of virt driver.

For the latter, get_per_instance_usage, that's only implemented by the 
xenapi driver which is on the path to being deprecated by the end of 
Train at this point anyway:

https://review.opendev.org/#/c/662295/

so I wouldn't worry too much about that one.

<snip>

In summary I'm not going to block attempts at proposal #2. As you said, 
there are existing bugs which should be handled, though some likely 
won't ever be completely fixed (automatic cleanup and recovery from live 
migration failures - the live migration methods are huge and have a lot 
of points of failure, so properly rolling back from all of those is 
going to be a big undertaking in test and review time, and I don't see 
either happening at this stage).

I think one of the motivations to keep VPMEM resource tracking isolated 
to the hypervisor was just to get something quick and dirty working with 
a minimal amount of impact to other parts of nova, like the data model, 
ResourceTracker, etc. If proposal #2 also solves issues for VGPUs and 
PCPUs then there is more justification for doing it.

Either way I'm not opposed to the #2 proposal so if that's what the 
people that are working on this want, go ahead. I personally don't plan 
on investing much review time in this series either way though, so 
that's kind of why I'm apathetic about this.

-- 

Thanks,

Matt

Open Stack

[nova] The pros/cons for libvirt persistent assignment and DB persistent assignment.

OpenStack

Community

Documentation

Branding & Legal