[openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

Balázs Gibizer balazs.gibizer at ericsson.com
Wed May 30 11:06:02 UTC 2018

On Tue, May 29, 2018 at 3:12 PM, Sylvain Bauza <sbauza at redhat.com> 
> On Tue, May 29, 2018 at 2:21 PM, Balázs Gibizer 
> <balazs.gibizer at ericsson.com> wrote:
>> On Tue, May 29, 2018 at 1:47 PM, Sylvain Bauza <sbauza at redhat.com> 
>> wrote:
>>> Le mar. 29 mai 2018 à 11:02, Balázs Gibizer 
>>> <balazs.gibizer at ericsson.com> a écrit :
>>>> On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza <sbauza at redhat.com>
>>>> wrote:
>>>> >
>>>> >
>>>> > On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA
>>>> > <nakamura.tetsuro at lab.ntt.co.jp> wrote
>>>> >
>>>> >> > In that situation, say for example with VGPU inventories, that
>>>> >> would mean
>>>> >> > that the compute node would stop reporting inventories for its
>>>> >> root RP, but
>>>> >> > would rather report inventories for at least one single child 
>>>> RP.
>>>> >> > In that model, do we reconcile the allocations that were 
>>>> already
>>>> >> made
>>>> >> > against the "root RP" inventory ?
>>>> >>
>>>> >> It would be nice to see Eric and Jay comment on this,
>>>> >> but if I'm not mistaken, when the virt driver stops reporting
>>>> >> inventories for its root RP, placement would try to delete that
>>>> >> inventory inside and raise InventoryInUse exception if any
>>>> >> allocations still exist on that resource.
>>>> >>
>>>> >> ```
>>>> >> update_from_provider_tree() (nova/compute/resource_tracker.py)
>>>> >>   + _set_inventory_for_provider() 
>>>> (nova/scheduler/client/report.py)
>>>> >>       + put() - PUT /resource_providers/<rp_uuid>/inventories 
>>>> with
>>>> >> new inventories (scheduler/client/report.py)
>>>> >>           + set_inventories() (placement/handler/inventory.py)
>>>> >>               + _set_inventory()
>>>> >> (placement/objects/resource_proveider.py)
>>>> >>                   + _delete_inventory_from_provider()
>>>> >> (placement/objects/resource_proveider.py)
>>>> >>                       -> raise exception.InventoryInUse
>>>> >> ```
>>>> >>
>>>> >> So we need some trick something like deleting VGPU allocations
>>>> >> before upgrading and set the allocation again for the created 
>>>> new
>>>> >> child after upgrading?
>>>> >>
>>>> >
>>>> > I wonder if we should keep the existing inventory in the root 
>>>> RP, and
>>>> > somehow just reserve the left resources (so Placement wouldn't 
>>>> pass
>>>> > that root RP for queries, but would still have allocations). But
>>>> > then, where and how to do this ? By the resource tracker ?
>>>> >
>>>> AFAIK it is the virt driver that decides to model the VGU resource 
>>>> at a
>>>> different place in the RP tree so I think it is the responsibility 
>>>> of
>>>> the same virt driver to move any existing allocation from the old 
>>>> place
>>>> to the new place during this change.
>>>> Cheers,
>>>> gibi
>>> Why not instead not move the allocation but rather have the virt 
>>> driver updating the root RP by modifying the reserved value to the 
>>> total size?
>>> That way, the virt driver wouldn't need to ask for an allocation 
>>> but rather continue to provide inventories...
>>> Thoughts?
>> Keeping the old allocaton at the old RP and adding a similar sized 
>> reservation in the new RP feels hackis as those are not really 
>> reserved GPUs but used GPUs just from the old RP. If somebody sums 
>> up the total reported GPUs in this setup via the placement API then 
>> she will get more GPUs in total that what is physically visible for 
>> the hypervisor as the GPUs part of the old allocation reported twice 
>> in two different total value. Could we just report less GPU 
>> inventories to the new RP until the old RP has GPU allocations?
> We could keep the old inventory in the root RP for the previous vGPU 
> type already supported in Queens and just add other inventories for 
> other vGPU types now supported. That looks possibly the simpliest 
> option as the virt driver knows that.

That works for me. Can we somehow deprecate the previous, already 
supported vGPU types to eventually get rid of the splitted inventory?

>> Some alternatives from my jetlagged brain:
>> a) Implement a move inventory/allocation API in placement. Given a 
>> resource class and a source RP uuid and a destination RP uuid 
>> placement moves the inventory and allocations of that resource class 
>> from the source RP to the destination RP. Then the virt drive can 
>> call this API to move the allocation. This has an impact on the fast 
>> forward upgrade as it needs running virt driver to do the allocation 
>> move.
> Instead of having the virt driver doing that (TBH, I don't like that 
> given both Xen and libvirt drivers have the same problem), we could 
> write a nova-manage upgrade call for that that would call the 
> Placement API, sure.

The nova-manage is another possible way similar to my idea #c) but 
there I imagined the logic in placement-manage instead of nova-manage.

>> b) For this I assume that live migrating an instance having a GPU 
>> allocation on the old RP will allocate GPU for that instance from 
>> the new RP. In the virt driver do not report GPUs to the new RP 
>> while there is allocation for such GPUs in the old RP. Let the 
>> deployer live migrate away the instances. When the virt driver 
>> detects that there is no more GPU allocations on the old RP it can 
>> delete the inventory from the old RP and report it to the new RP.
> For the moment, vGPUs don't support live migration, even within QEMU. 
> I haven't checked that, but IIUC when you live-migrate an instance 
> that have vGPUs, it will just migrate it without recreating the vGPUs.

If there is no live migration support for vGPUs then this option can be 

> Now, the problem is with the VGPU allocation, we should delete it 
> then. Maybe a new bug report ?

Sounds like a bug report to me :)

>> c) For this I assume that there is no support for live migration of 
>> an instance having a GPU. If there is GPU allocation in the old RP 
>> then virt driver does not report GPU inventory to the new RP just 
>> creates the new nested RPs. Provide a placement-manage command to do 
>> the inventory + allocation copy from the old RP to the new RP.
> what's the difference with the first alternative ?

I think after you mentioned nova-manage for the first alternative the 
difference became only doing it from nova-manage or from 
placement-manage. The placement-manage solution has the benefit of 
being a pure DB operation, moving inventory and allocation between two 
RPs while nova-manage would need to call a new placement API.

> Anyway, looks like it's pretty simple to just keep the inventory for 
> the already existing vGPU type in the root RP, and just add nested 
> RPs for other vGPU types.
> Oh, and btw. we could possibly have the same problem when we 
> implement the NUMA spec that I need to rework 
> https://review.openstack.org/#/c/552924/

If we want to move the VCPU resources from the root to the nested NUMA 
RP then yes, that feels like the same problem.


> -Sylvain
>> Cheers,
>> gibi
>>>> > -Sylvain
>>>> >
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: 
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list