[openstack-dev] [nova] About doing the migration claim with Placement API

Jay Pipes jaypipes at gmail.com
Wed Nov 2 20:52:00 UTC 2016

On 11/01/2016 10:14 AM, Alex Xu wrote:
> Currently we only update the resource usage with Placement API in the
> instance claim and the available resource update periodic task. But
> there is no claim for migration with placement API yet. This works is
> tracked by https://bugs.launchpad.net/nova/+bug/1621709. In newton, we
> only fix one bit which make the resource update periodic task works
> correctly, then it will auto-heal everything. For the migration claim
> part, that isn't the goal for newton release.
> So the first question is do we want to fix it in this release? If the
> answer is yes, there have a concern need to discuss.

Yes, I believe we should fix the underlying problem in Ocata. The 
underlying problem is what Sylvain brought up: live migrations do not 
currently use any sort of claim operation. The periodic resource audit 
is relied upon to essentially clean up the state of claimed resources 
over time, and as Chris points out in review comments on 
https://review.openstack.org/#/c/244489/, this leads to the scheduler 
operating on stale data and can lead to an increase in retry operations.

This needs to be fixed before even attempting to address the issue you 
bring up with the placement API calls from the resource tracker.

> In order to implement the drop of migration claim, the RT needs to
> remove allocation records on the specific RP(on the source/destination
> compute node). But there isn't any API can do that. The API about remove
> allocation records is 'DELETE /allocations/{consumer_uuid}', but it will
> delete all the allocation records for the consumer. So the initial
> fix(https://review.openstack.org/#/c/369172/) adds new API 'DELETE
> /resource_providers/{rp_uuid}/allocations/{consumer_id}'. But Chris Dent
> pointed out this against the original design. All the allocations for
> the specific consumer only can be dropped together.

Yes, and this is by design. Consumption of resources -- or the freeing 
thereof -- must be an atomic, transactional operation.

> There also have suggestion from Andrew, we can update all the allocation
> records for the consumer each time. That means the RT will build the
> original allocation records and new allocation records for the claim
> together, and put into one API. That API should be 'PUT
> /allocations/{consumer_uuid}'. Unfortunately that API doesn't replace
> all the allocation records for the consumer, it always amends the new
> allocation records for the consumer.

I see no reason why we can't change the behaviour of the `PUT 
/allocations/{consumer_uuid}` call to allow changing either the amounts 
of the allocated resources (a resize operation) or the set of resource 
provider UUIDs referenced in the allocations list (a move operation).

For instance, let's say we have an allocation for an instance "i1" that 
is consuming 2 VCPU and 2048 MEMORY_MB on compute node "rpA", 50 DISK_GB 
on a shared storage pool "rpC".

The allocations table would have the following records in it:

resource_provider resource_class consumer used
----------------- -------------- -------- ----
rpA               VCPU           i1          2
rpA               MEMORY_MB      i1       2048
rpC               DISK_GB        i1         50

Now, we need to migrate instance "i1" to compute node "rpB". The 
instance disk uses shared storage so the only allocation records we 
actually need to modify are the VCPU and MEMORY_MB records.

We would create the following REST API call from the resource tracker on 
the destination node:

PUT /allocations/i1
   "allocations": [
     "resource_provider": {
       "uuid": "rpB",
     "resources": {
       "VCPU": 2,
       "MEMORY_MB": 2048
     "resource_provider": {
       "uuid": "rpC",
     "resources": {
       "DISK_GB": 50

The placement service would receive that request payload and immediately 
grab any existing allocation records referencing consumer_uuid of "i1". 
It would notice that records referencing "rpA" (the source compute node) 
are no longer needed. It would notice that the DISK_GB allocation hasn't 
changed. And finally it would notice that there are new VCPU and 
MEMORY_MB records referring to a new resource provider "rpB" (the 
destination compute node).

A single SQL transaction would be built that executes the following:


   # Grab the source and destination compute node provider generations
   # to protect against concurrent writes...
   $RPA_GEN := SELECT generation FROM resource_providers
               WHERE uuid = 'rpA';
   $RPB_GEN := SELECT generation FROM resource_providers
               WHERE uuid = 'rpB';

   # Delete the allocation records referring to the source for the VCPU
   # and MEMORY_MB resources
   DELETE FROM allocations
   WHERE consumer = 'i1'
   AND resource_provider = 'rpA'
   AND resource_class IN ('VCPU', 'MEMORY_MB');

   # Add allocation records referring to the destination for VCPU and
   INSERT INTO allocations
   (resource_provider, resource_class, consumer, used)
   ('rpB', 'VCPU', 'i1', 2),
   ('rpb', 'MEMORY_MB', 'i1', 2048);

   # Update the resource provider generations and rollback the
   # transaction if any other writer modified the resource providers
   # in between the initial read time and here.
   UPDATE resource_providers
   SET generation = $RPA_GENERATION + 1
   WHERE uuid = 'rpA'
   AND generation = $RPA_GENERATION;


   UPDATE resource_providers
   SET generation = $RPB_GENERATION + 1
   WHERE uuid = 'rpB'
   AND generation = $RPB_GENERATION;



In this way, we keep the API as is but simply handle move operations 
transparently to the caller. The caller simply expresses what they wish 
the allocation to look like with regards to which resource providers are 
having which resources consumed from, and the placement service ensures 
that these allocation records are written in an atomic fashion.


> So which directly we should go at here?

More information about the OpenStack-dev mailing list