[openstack-dev] [nova] About doing the migration claim with Placement API
Alex Xu
soulxu at gmail.com
Fri Nov 11 08:50:02 UTC 2016
2016-11-03 4:52 GMT+08:00 Jay Pipes <jaypipes at gmail.com>:
> On 11/01/2016 10:14 AM, Alex Xu wrote:
>
>> Currently we only update the resource usage with Placement API in the
>> instance claim and the available resource update periodic task. But
>> there is no claim for migration with placement API yet. This works is
>> tracked by https://bugs.launchpad.net/nova/+bug/1621709. In newton, we
>> only fix one bit which make the resource update periodic task works
>> correctly, then it will auto-heal everything. For the migration claim
>> part, that isn't the goal for newton release.
>>
>> So the first question is do we want to fix it in this release? If the
>> answer is yes, there have a concern need to discuss.
>>
>
> Yes, I believe we should fix the underlying problem in Ocata. The
> underlying problem is what Sylvain brought up: live migrations do not
> currently use any sort of claim operation. The periodic resource audit is
> relied upon to essentially clean up the state of claimed resources over
> time, and as Chris points out in review comments on
> https://review.openstack.org/#/c/244489/, this leads to the scheduler
> operating on stale data and can lead to an increase in retry operations.
>
> This needs to be fixed before even attempting to address the issue you
> bring up with the placement API calls from the resource tracker.
ok, let me see if I can help something at here.
>
>
> In order to implement the drop of migration claim, the RT needs to
>> remove allocation records on the specific RP(on the source/destination
>> compute node). But there isn't any API can do that. The API about remove
>> allocation records is 'DELETE /allocations/{consumer_uuid}', but it will
>> delete all the allocation records for the consumer. So the initial
>> fix(https://review.openstack.org/#/c/369172/) adds new API 'DELETE
>> /resource_providers/{rp_uuid}/allocations/{consumer_id}'. But Chris Dent
>> pointed out this against the original design. All the allocations for
>> the specific consumer only can be dropped together.
>>
>
> Yes, and this is by design. Consumption of resources -- or the freeing
> thereof -- must be an atomic, transactional operation.
>
> There also have suggestion from Andrew, we can update all the allocation
>> records for the consumer each time. That means the RT will build the
>> original allocation records and new allocation records for the claim
>> together, and put into one API. That API should be 'PUT
>> /allocations/{consumer_uuid}'. Unfortunately that API doesn't replace
>> all the allocation records for the consumer, it always amends the new
>> allocation records for the consumer.
>>
>
> I see no reason why we can't change the behaviour of the `PUT
> /allocations/{consumer_uuid}` call to allow changing either the amounts of
> the allocated resources (a resize operation) or the set of resource
> provider UUIDs referenced in the allocations list (a move operation).
>
> For instance, let's say we have an allocation for an instance "i1" that is
> consuming 2 VCPU and 2048 MEMORY_MB on compute node "rpA", 50 DISK_GB on a
> shared storage pool "rpC".
>
> The allocations table would have the following records in it:
>
> resource_provider resource_class consumer used
> ----------------- -------------- -------- ----
> rpA VCPU i1 2
> rpA MEMORY_MB i1 2048
> rpC DISK_GB i1 50
>
> Now, we need to migrate instance "i1" to compute node "rpB". The instance
> disk uses shared storage so the only allocation records we actually need to
> modify are the VCPU and MEMORY_MB records.
>
yea, think about with shared storage, this makes sense a lot. Thanks for
such detail explain at here!
>
> We would create the following REST API call from the resource tracker on
> the destination node:
>
> PUT /allocations/i1
> {
> "allocations": [
> {
> "resource_provider": {
> "uuid": "rpB",
> },
> "resources": {
> "VCPU": 2,
> "MEMORY_MB": 2048
> }
> },
> {
> "resource_provider": {
> "uuid": "rpC",
> },
> "resources": {
> "DISK_GB": 50
> }
> }
> ]
> }
>
> The placement service would receive that request payload and immediately
> grab any existing allocation records referencing consumer_uuid of "i1". It
> would notice that records referencing "rpA" (the source compute node) are
> no longer needed. It would notice that the DISK_GB allocation hasn't
> changed. And finally it would notice that there are new VCPU and MEMORY_MB
> records referring to a new resource provider "rpB" (the destination compute
> node).
>
> A single SQL transaction would be built that executes the following:
>
> BEGIN;
>
> # Grab the source and destination compute node provider generations
> # to protect against concurrent writes...
> $RPA_GEN := SELECT generation FROM resource_providers
> WHERE uuid = 'rpA';
> $RPB_GEN := SELECT generation FROM resource_providers
> WHERE uuid = 'rpB';
>
> # Delete the allocation records referring to the source for the VCPU
> # and MEMORY_MB resources
> DELETE FROM allocations
> WHERE consumer = 'i1'
> AND resource_provider = 'rpA'
> AND resource_class IN ('VCPU', 'MEMORY_MB');
>
> # Add allocation records referring to the destination for VCPU and
> # MEMORY_MB
> INSERT INTO allocations
> (resource_provider, resource_class, consumer, used)
> VALUES
> ('rpB', 'VCPU', 'i1', 2),
> ('rpb', 'MEMORY_MB', 'i1', 2048);
>
> # Update the resource provider generations and rollback the
> # transaction if any other writer modified the resource providers
> # in between the initial read time and here.
> UPDATE resource_providers
> SET generation = $RPA_GENERATION + 1
> WHERE uuid = 'rpA'
> AND generation = $RPA_GENERATION;
>
> IF ROWS_AFFECTED() == 0:
> ROLLBACK
>
> UPDATE resource_providers
> SET generation = $RPB_GENERATION + 1
> WHERE uuid = 'rpB'
> AND generation = $RPB_GENERATION;
>
> IF ROWS_AFFECTED() == 0:
> ROLLBACK
>
> COMMIT;
>
> In this way, we keep the API as is but simply handle move operations
> transparently to the caller. The caller simply expresses what they wish the
> allocation to look like with regards to which resource providers are having
> which resources consumed from, and the placement service ensures that these
> allocation records are written in an atomic fashion.
>
> Best,
> -jay
>
>
> So which directly we should go at here?
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20161111/5e3c8902/attachment.html>
More information about the OpenStack-dev
mailing list