[openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

Eric Fried openstack at fried.cc
Fri Jun 1 15:11:43 UTC 2018


Sylvain-

On 05/31/2018 02:41 PM, Sylvain Bauza wrote:
> 
> 
> On Thu, May 31, 2018 at 8:26 PM, Eric Fried <openstack at fried.cc
> <mailto:openstack at fried.cc>> wrote:
> 
>     > 1. Make everything perform the pivot on compute node start (which can be
>     >    re-used by a CLI tool for the offline case)
>     > 2. Make everything default to non-nested inventory at first, and provide
>     >    a way to migrate a compute node and its instances one at a time (in
>     >    place) to roll through.
> 
>     I agree that it sure would be nice to do ^ rather than requiring the
>     "slide puzzle" thing.
> 
>     But how would this be accomplished, in light of the current "separation
>     of responsibilities" drawn at the virt driver interface, whereby the
>     virt driver isn't supposed to talk to placement directly, or know
>     anything about allocations?  Here's a first pass:
> 
> 
> 
> What we usually do is to implement either at the compute service level
> or at the virt driver level some init_host() method that will reconcile
> what you want.
> For example, we could just imagine a non-virt specific method (and I
> like that because it's non-virt specific) - ie. called by compute's
> init_host() that would lookup the compute root RP inventories, see
> whether one ore more inventories tied to specific resource classes have
> to be moved from the root RP and be attached to a child RP.
> The only subtility that would require a virt-specific update would be
> the name of the child RP (as both Xen and libvirt plan to use the child
> RP name as the vGPU type identifier) but that's an implementation detail
> that a possible virt driver update by the resource tracker would
> reconcile that.

The question was rhetorical; my suggestion (below) was an attempt at
designing exactly what you've described.  Let me know if I can
explain/clarify it further.  I'm looking for feedback as to whether it's
a viable approach.

>     The virt driver, via the return value from update_provider_tree, tells
>     the resource tracker that "inventory of resource class A on provider B
>     have moved to provider C" for all applicable AxBxC.  E.g.
> 
>     [ { 'from_resource_provider': <cn_rp_uuid>,
>         'moved_resources': [VGPU: 4],
>         'to_resource_provider': <gpu_rp1_uuid>
>       },
>       { 'from_resource_provider': <cn_rp_uuid>,
>         'moved_resources': [VGPU: 4],
>         'to_resource_provider': <gpu_rp2_uuid>
>       },
>       { 'from_resource_provider': <cn_rp_uuid>,
>         'moved_resources': [
>             SRIOV_NET_VF: 2,
>             NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
>             NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
>         ],
>         'to_resource_provider': <gpu_rp2_uuid>
>       }
>     ]
> 
>     As today, the resource tracker takes the updated provider tree and
>     invokes [1] the report client method update_from_provider_tree [2] to
>     flush the changes to placement.  But now update_from_provider_tree also
>     accepts the return value from update_provider_tree and, for each "move":
> 
>     - Creates provider C (as described in the provider_tree) if it doesn't
>     already exist.
>     - Creates/updates provider C's inventory as described in the
>     provider_tree (without yet updating provider B's inventory).  This ought
>     to create the inventory of resource class A on provider C.
>     - Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
>     - Updates provider B's inventory.
> 
>     (*There's a hole here: if we're splitting a glommed-together inventory
>     across multiple new child providers, as the VGPUs in the example, we
>     don't know which allocations to put where.  The virt driver should know
>     which instances own which specific inventory units, and would be able to
>     report that info within the data structure.  That's getting kinda close
>     to the virt driver mucking with allocations, but maybe it fits well
>     enough into this model to be acceptable?)
> 
>     Note that the return value from update_provider_tree is optional, and
>     only used when the virt driver is indicating a "move" of this ilk.  If
>     it's None/[] then the RT/update_from_provider_tree flow is the same as
>     it is today.
> 
>     If we can do it this way, we don't need a migration tool.  In fact, we
>     don't even need to restrict provider tree "reshaping" to release
>     boundaries.  As long as the virt driver understands its own data model
>     migrations and reports them properly via update_provider_tree, it can
>     shuffle its tree around whenever it wants.
> 
>     Thoughts?
> 
>     -efried
> 
>     [1]
>     https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890
>     <https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890>
>     [2]
>     https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341
>     <https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341>
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list