[openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers
Balázs Gibizer
balazs.gibizer at ericsson.com
Wed May 30 11:06:02 UTC 2018
On Tue, May 29, 2018 at 3:12 PM, Sylvain Bauza <sbauza at redhat.com>
wrote:
>
>
> On Tue, May 29, 2018 at 2:21 PM, Balázs Gibizer
> <balazs.gibizer at ericsson.com> wrote:
>>
>>
>> On Tue, May 29, 2018 at 1:47 PM, Sylvain Bauza <sbauza at redhat.com>
>> wrote:
>>>
>>>
>>> Le mar. 29 mai 2018 à 11:02, Balázs Gibizer
>>> <balazs.gibizer at ericsson.com> a écrit :
>>>>
>>>>
>>>> On Tue, May 29, 2018 at 9:38 AM, Sylvain Bauza <sbauza at redhat.com>
>>>> wrote:
>>>> >
>>>> >
>>>> > On Tue, May 29, 2018 at 3:08 AM, TETSURO NAKAMURA
>>>> > <nakamura.tetsuro at lab.ntt.co.jp> wrote
>>>> >
>>>> >> > In that situation, say for example with VGPU inventories, that
>>>> >> would mean
>>>> >> > that the compute node would stop reporting inventories for its
>>>> >> root RP, but
>>>> >> > would rather report inventories for at least one single child
>>>> RP.
>>>> >> > In that model, do we reconcile the allocations that were
>>>> already
>>>> >> made
>>>> >> > against the "root RP" inventory ?
>>>> >>
>>>> >> It would be nice to see Eric and Jay comment on this,
>>>> >> but if I'm not mistaken, when the virt driver stops reporting
>>>> >> inventories for its root RP, placement would try to delete that
>>>> >> inventory inside and raise InventoryInUse exception if any
>>>> >> allocations still exist on that resource.
>>>> >>
>>>> >> ```
>>>> >> update_from_provider_tree() (nova/compute/resource_tracker.py)
>>>> >> + _set_inventory_for_provider()
>>>> (nova/scheduler/client/report.py)
>>>> >> + put() - PUT /resource_providers/<rp_uuid>/inventories
>>>> with
>>>> >> new inventories (scheduler/client/report.py)
>>>> >> + set_inventories() (placement/handler/inventory.py)
>>>> >> + _set_inventory()
>>>> >> (placement/objects/resource_proveider.py)
>>>> >> + _delete_inventory_from_provider()
>>>> >> (placement/objects/resource_proveider.py)
>>>> >> -> raise exception.InventoryInUse
>>>> >> ```
>>>> >>
>>>> >> So we need some trick something like deleting VGPU allocations
>>>> >> before upgrading and set the allocation again for the created
>>>> new
>>>> >> child after upgrading?
>>>> >>
>>>> >
>>>> > I wonder if we should keep the existing inventory in the root
>>>> RP, and
>>>> > somehow just reserve the left resources (so Placement wouldn't
>>>> pass
>>>> > that root RP for queries, but would still have allocations). But
>>>> > then, where and how to do this ? By the resource tracker ?
>>>> >
>>>>
>>>> AFAIK it is the virt driver that decides to model the VGU resource
>>>> at a
>>>> different place in the RP tree so I think it is the responsibility
>>>> of
>>>> the same virt driver to move any existing allocation from the old
>>>> place
>>>> to the new place during this change.
>>>>
>>>> Cheers,
>>>> gibi
>>>
>>> Why not instead not move the allocation but rather have the virt
>>> driver updating the root RP by modifying the reserved value to the
>>> total size?
>>>
>>> That way, the virt driver wouldn't need to ask for an allocation
>>> but rather continue to provide inventories...
>>>
>>> Thoughts?
>>
>> Keeping the old allocaton at the old RP and adding a similar sized
>> reservation in the new RP feels hackis as those are not really
>> reserved GPUs but used GPUs just from the old RP. If somebody sums
>> up the total reported GPUs in this setup via the placement API then
>> she will get more GPUs in total that what is physically visible for
>> the hypervisor as the GPUs part of the old allocation reported twice
>> in two different total value. Could we just report less GPU
>> inventories to the new RP until the old RP has GPU allocations?
>>
>
>
> We could keep the old inventory in the root RP for the previous vGPU
> type already supported in Queens and just add other inventories for
> other vGPU types now supported. That looks possibly the simpliest
> option as the virt driver knows that.
That works for me. Can we somehow deprecate the previous, already
supported vGPU types to eventually get rid of the splitted inventory?
>
>
>> Some alternatives from my jetlagged brain:
>>
>> a) Implement a move inventory/allocation API in placement. Given a
>> resource class and a source RP uuid and a destination RP uuid
>> placement moves the inventory and allocations of that resource class
>> from the source RP to the destination RP. Then the virt drive can
>> call this API to move the allocation. This has an impact on the fast
>> forward upgrade as it needs running virt driver to do the allocation
>> move.
>>
>
> Instead of having the virt driver doing that (TBH, I don't like that
> given both Xen and libvirt drivers have the same problem), we could
> write a nova-manage upgrade call for that that would call the
> Placement API, sure.
The nova-manage is another possible way similar to my idea #c) but
there I imagined the logic in placement-manage instead of nova-manage.
>
>> b) For this I assume that live migrating an instance having a GPU
>> allocation on the old RP will allocate GPU for that instance from
>> the new RP. In the virt driver do not report GPUs to the new RP
>> while there is allocation for such GPUs in the old RP. Let the
>> deployer live migrate away the instances. When the virt driver
>> detects that there is no more GPU allocations on the old RP it can
>> delete the inventory from the old RP and report it to the new RP.
>>
>
> For the moment, vGPUs don't support live migration, even within QEMU.
> I haven't checked that, but IIUC when you live-migrate an instance
> that have vGPUs, it will just migrate it without recreating the vGPUs.
If there is no live migration support for vGPUs then this option can be
ignored.
> Now, the problem is with the VGPU allocation, we should delete it
> then. Maybe a new bug report ?
Sounds like a bug report to me :)
>
>> c) For this I assume that there is no support for live migration of
>> an instance having a GPU. If there is GPU allocation in the old RP
>> then virt driver does not report GPU inventory to the new RP just
>> creates the new nested RPs. Provide a placement-manage command to do
>> the inventory + allocation copy from the old RP to the new RP.
>>
>
> what's the difference with the first alternative ?
I think after you mentioned nova-manage for the first alternative the
difference became only doing it from nova-manage or from
placement-manage. The placement-manage solution has the benefit of
being a pure DB operation, moving inventory and allocation between two
RPs while nova-manage would need to call a new placement API.
>
> Anyway, looks like it's pretty simple to just keep the inventory for
> the already existing vGPU type in the root RP, and just add nested
> RPs for other vGPU types.
> Oh, and btw. we could possibly have the same problem when we
> implement the NUMA spec that I need to rework
> https://review.openstack.org/#/c/552924/
If we want to move the VCPU resources from the root to the nested NUMA
RP then yes, that feels like the same problem.
gibi
>
> -Sylvain
>> Cheers,
>> gibi
>>
>>
>>>
>>>>
>>>> > -Sylvain
>>>> >
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list