[nova] The pros/cons for libvirt persistent assignment and DB persistent assignment.

21 Aug 2019

      We get a lot of discussion on how to do the claim for the vpmem. There are
a few points we are trying to match:

* Avoid race problem. (the current VGPU assignment has been found having
race issue https://launchpad.net/bugs/1836204)
* Avoid the device assignment management to be virt driver and
platform-specific.
* Keep it simple.

Currently, we go through two solutions here. This email is going to summary
the pros/cons of these two solutions.

#1 Without Nova DB persistent for the assignment info, depends on
hypervisor persistent it.

   The idea is adding VirtDriver.claim/unclaim_for_instance(instance_uuid,
flavor_id) interface. The assignment info is populated from hypervisor when
nova-compute startup. And keep in the memory of VirtDriver. The
instance_uuid is used to distinguish the claim from the different instance.
The flavor_id is used for the same host resize, to distinguish the claim
for source and target. This virt driver method is being invoked inside
ResourceTracker to avoid the race problem. There is no any nova DB
persistent for the assignment info.
https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:mas...

pros:
* Hidden all the device detail and virt driver detail inside the virt
driver.
* Less upgrade issue in the future since it doesn't involve any nova DB
model change
* Expecting as simple implementation since everything inside virt driver.
cons:
   * Two cases are being found, the domain XML being lost for Libvirt virt
driver. And we don't know other hypervisor behavior yet.
      * For the same host resize, the source and target instance are
sharing single one domain XML. After the libvirt virt driver updated the
domain XML to the target instance, the source instance's assignment
information will be lost when a nova-compute restart happened. That means
the resized instance can't be revert, the only choice for the user is to
confirm the resize.
      * For live migration, the target host's domain XML will be cleanup by
libvirt after a host restart. The assignment information is lost before
nova-compute startup and doing a cleanup.
   * Can not support the same host cold migration. Since we need a way to
identify the source and target instance's assignment in memory. But the
same host cold migration means the same instance UUID and same flavor ID,
there isn't another else can be used to distinguish the assignment.
   * There are workarounds added for above points, the code becomes fragile.

#2 With nova DB persistent, but using virt driver specific blob to store
virt driver specific info.

   The idea is persistent the assignment for instance into DB. The resource
tracker gets available resources from virt driver. The resource tracker
will calculate on the fly based on available resources and assigned
resources from instance DB. The new field ·instance.resources· is designed
for supporting virt driver specific metadata, then hidden the virt driver
and platform detail from RT.
https://etherpad.openstack.org/p/vpmems-non-virt-driver-specific-new

pros:
   * Persistent assignment into instance object. Avoid the corner case we
lost the assignment.
   * The ResourceTracker is responsible for doing the claim job. This is
more reliable and no race problem, since ResourceTracker works very well
for a long time.
   * The virt driver specific json-blob hidden the virt driver/platform
detail from the ResourceTracker.
   * The free resource is calculated on the fly, keeping the implementation
simple. Actually, the RT just provides a point to do the claim, needn't
involve the complex of RT.update_available_resources
cons:
   * Doesn't like PCIManager which has both instance side and host side
persistent info. On the fly calculation should take care of the orphaned
instance(the instance is deleted from DB, but still existing on the host),
so actually, it isn't unresolvable issue. And it isn't too hard to upgrade
to have host side persistent info in the future if we want.
   * Data model change for the original proposal. Need review to decide the
data model enough generic

Currently, Sean, Eric and I prefer the #2 now since the #1 has flaws for
the same host resize and live migration can't be skipped by design.

Looking for more feedback, and will appreciate it!

Thanks
Alex

Alex Xu

Eric Fried

Sean Mooney

Matt Riedemann

Alex Xu

Matt Riedemann

tags

participants (4)