[openstack-dev] [nova] About doing the migration claim with Placement API
Jay Pipes
jaypipes at gmail.com
Wed Nov 2 20:52:00 UTC 2016
On 11/01/2016 10:14 AM, Alex Xu wrote:
> Currently we only update the resource usage with Placement API in the
> instance claim and the available resource update periodic task. But
> there is no claim for migration with placement API yet. This works is
> tracked by https://bugs.launchpad.net/nova/+bug/1621709. In newton, we
> only fix one bit which make the resource update periodic task works
> correctly, then it will auto-heal everything. For the migration claim
> part, that isn't the goal for newton release.
>
> So the first question is do we want to fix it in this release? If the
> answer is yes, there have a concern need to discuss.
Yes, I believe we should fix the underlying problem in Ocata. The
underlying problem is what Sylvain brought up: live migrations do not
currently use any sort of claim operation. The periodic resource audit
is relied upon to essentially clean up the state of claimed resources
over time, and as Chris points out in review comments on
https://review.openstack.org/#/c/244489/, this leads to the scheduler
operating on stale data and can lead to an increase in retry operations.
This needs to be fixed before even attempting to address the issue you
bring up with the placement API calls from the resource tracker.
> In order to implement the drop of migration claim, the RT needs to
> remove allocation records on the specific RP(on the source/destination
> compute node). But there isn't any API can do that. The API about remove
> allocation records is 'DELETE /allocations/{consumer_uuid}', but it will
> delete all the allocation records for the consumer. So the initial
> fix(https://review.openstack.org/#/c/369172/) adds new API 'DELETE
> /resource_providers/{rp_uuid}/allocations/{consumer_id}'. But Chris Dent
> pointed out this against the original design. All the allocations for
> the specific consumer only can be dropped together.
Yes, and this is by design. Consumption of resources -- or the freeing
thereof -- must be an atomic, transactional operation.
> There also have suggestion from Andrew, we can update all the allocation
> records for the consumer each time. That means the RT will build the
> original allocation records and new allocation records for the claim
> together, and put into one API. That API should be 'PUT
> /allocations/{consumer_uuid}'. Unfortunately that API doesn't replace
> all the allocation records for the consumer, it always amends the new
> allocation records for the consumer.
I see no reason why we can't change the behaviour of the `PUT
/allocations/{consumer_uuid}` call to allow changing either the amounts
of the allocated resources (a resize operation) or the set of resource
provider UUIDs referenced in the allocations list (a move operation).
For instance, let's say we have an allocation for an instance "i1" that
is consuming 2 VCPU and 2048 MEMORY_MB on compute node "rpA", 50 DISK_GB
on a shared storage pool "rpC".
The allocations table would have the following records in it:
resource_provider resource_class consumer used
----------------- -------------- -------- ----
rpA VCPU i1 2
rpA MEMORY_MB i1 2048
rpC DISK_GB i1 50
Now, we need to migrate instance "i1" to compute node "rpB". The
instance disk uses shared storage so the only allocation records we
actually need to modify are the VCPU and MEMORY_MB records.
We would create the following REST API call from the resource tracker on
the destination node:
PUT /allocations/i1
{
"allocations": [
{
"resource_provider": {
"uuid": "rpB",
},
"resources": {
"VCPU": 2,
"MEMORY_MB": 2048
}
},
{
"resource_provider": {
"uuid": "rpC",
},
"resources": {
"DISK_GB": 50
}
}
]
}
The placement service would receive that request payload and immediately
grab any existing allocation records referencing consumer_uuid of "i1".
It would notice that records referencing "rpA" (the source compute node)
are no longer needed. It would notice that the DISK_GB allocation hasn't
changed. And finally it would notice that there are new VCPU and
MEMORY_MB records referring to a new resource provider "rpB" (the
destination compute node).
A single SQL transaction would be built that executes the following:
BEGIN;
# Grab the source and destination compute node provider generations
# to protect against concurrent writes...
$RPA_GEN := SELECT generation FROM resource_providers
WHERE uuid = 'rpA';
$RPB_GEN := SELECT generation FROM resource_providers
WHERE uuid = 'rpB';
# Delete the allocation records referring to the source for the VCPU
# and MEMORY_MB resources
DELETE FROM allocations
WHERE consumer = 'i1'
AND resource_provider = 'rpA'
AND resource_class IN ('VCPU', 'MEMORY_MB');
# Add allocation records referring to the destination for VCPU and
# MEMORY_MB
INSERT INTO allocations
(resource_provider, resource_class, consumer, used)
VALUES
('rpB', 'VCPU', 'i1', 2),
('rpb', 'MEMORY_MB', 'i1', 2048);
# Update the resource provider generations and rollback the
# transaction if any other writer modified the resource providers
# in between the initial read time and here.
UPDATE resource_providers
SET generation = $RPA_GENERATION + 1
WHERE uuid = 'rpA'
AND generation = $RPA_GENERATION;
IF ROWS_AFFECTED() == 0:
ROLLBACK
UPDATE resource_providers
SET generation = $RPB_GENERATION + 1
WHERE uuid = 'rpB'
AND generation = $RPB_GENERATION;
IF ROWS_AFFECTED() == 0:
ROLLBACK
COMMIT;
In this way, we keep the API as is but simply handle move operations
transparently to the caller. The caller simply expresses what they wish
the allocation to look like with regards to which resource providers are
having which resources consumed from, and the placement service ensures
that these allocation records are written in an atomic fashion.
Best,
-jay
> So which directly we should go at here?
More information about the OpenStack-dev
mailing list