Re: [nova] The pros/cons for libvirt persistent assignment and DB persistent assignment.

23 Aug 2019

      On 8/23/2019 3:43 AM, Alex Xu wrote:
...
2. Supporting same host cold migration
This is the point I said below.
For the same host resize, instance_uuid + flavor_id is working very 
well. But it can't support the same host cold migration. And yes, 
migration allocation can fix it. But also as you said, do we need to 
support the same host cold migration?
I see no reason to try and bend over backward to support same host cold 
migration since, as I said, the only virt driver that supports that 
today (and has been the only one for a long time - maybe forever?) is 
the vmware driver which isn't supporting any of these more advanced 
flows (VGPU, VPMEM, PCPU).
...
If the answer is no, then we needn't bother it. instance_uuid + 
flavor_id is much simple. If the answer is yes, right, we can put it 
into the RT. But it will be complex, maybe we need a data model like the 
DB way proposal to pass the virt driver/platform specific info between 
RT and virt driver. Also, think about the case, we need to check if 
there is any incomplete live-migration, we need to do a cleanup for all 
free vpmems, since we lost the allocation info for live-migration. Then 
we need a virt dirver interface to trigger that cleanup, pretty sure I 
don't want to call it as driver.cleanup_vpmems(). We also need to change 
the existing driver.spawn method, to pass the assigned resource into 
virt driver. Also thinking about the case of interrupted migration, I 
guess there is no way to switch the
I also remember Dan said, it isn't good to not support same host cold 
migration.
Again, the libvirt driver, as far as I know, has never supported same 
host cold migration, nor is anyone working on that, so I don't see where 
the need to make that support happen now is coming from. I think it 
should be ignored for the sake of these conversations.
...
I think you are right, we can use RT.tracked_instances and 
RT.tracked_migrations. Then it isn't on the fly anymore.
There are two existing bugs should be fixed.
1. The orphaned instance isn't in RT.tracked_instance. Although there is 
resource consuming for orphaned instance 
https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501b..., 
the virt driver interface 
https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501b... doesn't 
implement by most of virt driver.
For the latter, get_per_instance_usage, that's only implemented by the 
xenapi driver which is on the path to being deprecated by the end of 
Train at this point anyway:

https://review.opendev.org/#/c/662295/

so I wouldn't worry too much about that one.

<snip>

In summary I'm not going to block attempts at proposal #2. As you said, 
there are existing bugs which should be handled, though some likely 
won't ever be completely fixed (automatic cleanup and recovery from live 
migration failures - the live migration methods are huge and have a lot 
of points of failure, so properly rolling back from all of those is 
going to be a big undertaking in test and review time, and I don't see 
either happening at this stage).

I think one of the motivations to keep VPMEM resource tracking isolated 
to the hypervisor was just to get something quick and dirty working with 
a minimal amount of impact to other parts of nova, like the data model, 
ResourceTracker, etc. If proposal #2 also solves issues for VGPUs and 
PCPUs then there is more justification for doing it.

Either way I'm not opposed to the #2 proposal so if that's what the 
people that are working on this want, go ahead. I personally don't plan 
on investing much review time in this series either way though, so 
that's kind of why I'm apathetic about this.

-- 

Thanks,

Matt

Re: [nova] The pros/cons for libvirt persistent assignment and DB persistent assignment.

Matt Riedemann