[openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict
Balázs Gibizer
balazs.gibizer at ericsson.com
Thu Aug 16 11:43:23 UTC 2018
reformatted for readabiliy, sorry:
Hi,
tl;dr: To properly use consumer generation (placement 1.28) in Nova we
need to decide how to handle consumer generation conflict from Nova
perspective:
a) Nova reads the current consumer_generation before the allocation
update operation and use that generation in the allocation update
operation. If the allocation is changed between the read and the
update then nova fails the server lifecycle operation and let the
end user retry it.
b) Like a) but in case of conflict nova blindly retries the
read-and-update operation pair couple of times and if only fails
the life cycle operation if run out of retries.
c) Nova stores its own view of the allocation. When a consumer's
allocation needs to be modified then nova reads the current state
of the consumer from placement. Then nova combines the two
allocations to generate the new expected consumer state. In case
of generation conflict nova retries the read-combine-update
operation triplet.
Which way we should go now?
What should be or long term goal?
Details:
There are plenty of affected lifecycle operations. See the patch series
starting at [1].
For example:
The current patch[1] that handles the delete server case implements
option b). It simly reads the current consumer generation from
placement and uses that to send a PUT /allocatons/{instance_uuid} with
"allocations": {} in its body.
Here implementing option c) would mean that during server delete nova
needs:
1) to compile its own view of the resource need of the server
(currently based on the flavor but in the future based on the
attached port's resource requests as well)
2) then read the current allocation of the server from placement
3) then subtract the server resource needs from the current allocation
and send the resulting allocation back in the update to placement
In the simple case this subtraction would result in an empty allocation
sent to placement. Also in this simple case c) has the same effect as
b) currently implementated in [1].
However if somebody outside of nova modifies the allocation of this
consumer in a way that nova does not know about such changed resource
need then b) and c) will result in different placement state after
server delete.
I only know of one example, the change of neutron port's resource
request while the port is attached. (Note, it is out of scope in the
first step of bandwidth implementation.) In this specific example
option c) can work if nova re-reads the port's resource request during
delete when recalculates its own view of the server resource needs. But
I don't know if every other resource (e.g. accelerators) used by a
server can be / will be handled this way.
Other examples of affected lifecycle operations:
During a server migration moving the source host allocation from the
instance_uuid to a the migration_uuid fails with consumer generation
conflict because of the instance_uuid consumer generation. [2]
Confirming a migration fails as the deletion of the source host
allocation fails due to the consumer generation conflict of the
migration_uuid consumer that is being emptied.[3]
During scheduling of a new server putting allocation to instance_uuid
fails as the scheduler assumes that it is a new consumer and therefore
uses consumer_generation: None for the allocation, but placement
reports generation conflict. [4]
During a non-forced evacuation the scheduler tries to claim the
resource on the destination host with the instance_uuid, but that
consumer already holds the source allocation therefore the scheduler
cannot assume that the instance_uuid is a new consumer. [4]
[1] https://review.openstack.org/#/c/591597
[2] https://review.openstack.org/#/c/591810
[3] https://review.openstack.org/#/c/591811
[4] https://review.openstack.org/#/c/583667
More information about the OpenStack-dev
mailing list