Open Stack

Mon Jun 4 23:00:48 UTC 2018

There has been much discussion.  We've gotten to a point of an initial
proposal and are ready for more (hopefully smaller, hopefully
conclusive) discussion.

To that end, there will be a HANGOUT tomorrow (TUESDAY, JUNE 5TH) at
1500 UTC.  Be in #openstack-placement to get the link to join.

The strawpeople outlined below and discussed in the referenced etherpad
have been consolidated/distilled into a new etherpad [1] around which
the hangout discussion will be centered.

[1] https://etherpad.openstack.org/p/placement-making-the-(up)grade

Thanks,
efried

On 06/01/2018 01:12 PM, Jay Pipes wrote:
> On 05/31/2018 02:26 PM, Eric Fried wrote:
>>> 1. Make everything perform the pivot on compute node start (which can be
>>>     re-used by a CLI tool for the offline case)
>>> 2. Make everything default to non-nested inventory at first, and provide
>>>     a way to migrate a compute node and its instances one at a time (in
>>>     place) to roll through.
>>
>> I agree that it sure would be nice to do ^ rather than requiring the
>> "slide puzzle" thing.
>>
>> But how would this be accomplished, in light of the current "separation
>> of responsibilities" drawn at the virt driver interface, whereby the
>> virt driver isn't supposed to talk to placement directly, or know
>> anything about allocations?
> FWIW, I don't have a problem with the virt driver "knowing about
> allocations". What I have a problem with is the virt driver *claiming
> resources for an instance*.
> 
> That's what the whole placement claims resources things was all about,
> and I'm not interested in stepping back to the days of long racy claim
> operations by having the compute nodes be responsible for claiming
> resources.
> 
> That said, once the consumer generation microversion lands [1], it
> should be possible to *safely* modify an allocation set for a consumer
> (instance) and move allocation records for an instance from one provider
> to another.
> 
> [1] https://review.openstack.org/#/c/565604/
> 
>> Here's a first pass:
>>
>> The virt driver, via the return value from update_provider_tree, tells
>> the resource tracker that "inventory of resource class A on provider B
>> have moved to provider C" for all applicable AxBxC.  E.g.
>>
>> [ { 'from_resource_provider': <cn_rp_uuid>,
>>      'moved_resources': [VGPU: 4],
>>      'to_resource_provider': <gpu_rp1_uuid>
>>    },
>>    { 'from_resource_provider': <cn_rp_uuid>,
>>      'moved_resources': [VGPU: 4],
>>      'to_resource_provider': <gpu_rp2_uuid>
>>    },
>>    { 'from_resource_provider': <cn_rp_uuid>,
>>      'moved_resources': [
>>          SRIOV_NET_VF: 2,
>>          NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND: 1000,
>>          NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND: 1000,
>>      ],
>>      'to_resource_provider': <gpu_rp2_uuid>
>>    }
>> ]
>>
>> As today, the resource tracker takes the updated provider tree and
>> invokes [1] the report client method update_from_provider_tree [2] to
>> flush the changes to placement.  But now update_from_provider_tree also
>> accepts the return value from update_provider_tree and, for each "move":
>>
>> - Creates provider C (as described in the provider_tree) if it doesn't
>> already exist.
>> - Creates/updates provider C's inventory as described in the
>> provider_tree (without yet updating provider B's inventory).  This ought
>> to create the inventory of resource class A on provider C.
> 
> Unfortunately, right here you'll introduce a race condition. As soon as
> this operation completes, the scheduler will have the ability to throw
> new instances on provider C and consume the inventory from it that you
> intend to give to the existing instance that is consuming from provider B.
> 
>> - Discovers allocations of rc A on rp B and POSTs to move them to rp C*.
> 
> For each consumer of resources on rp B, right?
> 
>> - Updates provider B's inventory.
> 
> Again, this is problematic because the scheduler will have already begun
> to place new instances on B's inventory, which could very well result in
> incorrect resource accounting on the node.
> 
> We basically need to have one giant new REST API call that accepts the
> list of "move instructions" and performs all of the instructions in a
> single transaction. :(
> 
>> (*There's a hole here: if we're splitting a glommed-together inventory
>> across multiple new child providers, as the VGPUs in the example, we
>> don't know which allocations to put where.  The virt driver should know
>> which instances own which specific inventory units, and would be able to
>> report that info within the data structure.  That's getting kinda close
>> to the virt driver mucking with allocations, but maybe it fits well
>> enough into this model to be acceptable?)
> 
> Well, it's not really the virt driver *itself* mucking with the
> allocations. It's more that the virt driver is telling something *else*
> the move instructions that it feels are needed...
> 
>> Note that the return value from update_provider_tree is optional, and
>> only used when the virt driver is indicating a "move" of this ilk.  If
>> it's None/[] then the RT/update_from_provider_tree flow is the same as
>> it is today.
>>
>> If we can do it this way, we don't need a migration tool.  In fact, we
>> don't even need to restrict provider tree "reshaping" to release
>> boundaries.  As long as the virt driver understands its own data model
>> migrations and reports them properly via update_provider_tree, it can
>> shuffle its tree around whenever it wants.
> 
> Due to the many race conditions we would have in trying to fudge
> inventory amounts (the reserved/total thing) and allocation movement for
>>1 consumer at a time, I'm pretty sure the only safe thing to do is have
> a single new HTTP endpoint that would take this list of move operations
> and perform them atomically (on the placement server side of course).
> 
> Here's a strawman for how that HTTP endpoint might look like:
> 
> https://etherpad.openstack.org/p/placement-migrate-operations
> 
> feel free to markup and destroy.
> 
> Best,
> -jay
> 
>> Thoughts?
>>
>> -efried
>>
>> [1]
>> https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/compute/resource_tracker.py#L890
>>
>> [2]
>> https://github.com/openstack/nova/blob/8753c9a38667f984d385b4783c3c2fc34d7e8e1b/nova/scheduler/client/report.py#L1341
>>
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

OpenStack

Community

Documentation

Branding & Legal