[openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict
Balázs Gibizer
balazs.gibizer at ericsson.com
Thu Aug 16 11:31:49 UTC 2018
Hi,
tl;dr: To properly use consumer generation (placement 1.28) in Nova we
need to
decide how to handle consumer generation conflict from Nova perspective:
a) Nova reads the current consumer_generation before the allocation
update
operation and use that generation in the allocation update operation.
If the allocation is changed between the read and the update then
nova
fails the server lifecycle operation and let the end user retry it.
b) Like a) but in case of conflict nova blindly retries the
read-and-update
operation pair couple of times and if only fails the life cycle
operation
if run out of retries.
c) Nova stores its own view of the allocation. When a consumer's
allocation
needs to be modified then nova reads the current state of the
consumer from
placement. Then nova combines the two allocations to generate the new
expected consumer state. In case of generation conflict nova retries
the
read-combine-update operation triplet.
Which way we should go now?
What should be or long term goal?
Details:
There are plenty of affected lifecycle operations. See the patch series
starting at [1].
For example:
The current patch[1] that handles the delete server case implements
option b).
It simly reads the current consumer generation from placement and uses
that to
send a PUT /allocatons/{instance_uuid} with "allocations": {} in its
body.
Here implementing option c) would mean that during server delete nova
needs:
1) to compile its own view of the resource need of the server
(currently based
on the flavor but in the future based on the attached port's resource
requests as well)
2) then read the current allocation of the server from placement
3) then subtract the server resource needs from the current allocation
and
send the resulting allocation back in the update to placement
In the simple case this subtraction would result in an empty allocation
sent to
placement. Also in this simple case c) has the same effect as b)
currently
implementated in [1].
However if somebody outside of nova modifies the allocation of this
consumer in
a way that nova does not know about such changed resource need then b)
and c)
will result in different placement state after server delete.
I only know of one example, the change of neutron port's resource
request while
the port is attached. (Note, it is out of scope in the first step of
bandwidth
implementation.) In this specific example option c) can work if nova
re-reads
the port's resource request during delete when recalculates its own
view of the
server resource needs. But I don't know if every other resource (e.g.
accelerators) used by a server can be / will be handled this way.
Other examples of affected lifecycle operations:
During a server migration moving the source host allocation from the
instance_uuid to a the migration_uuid fails with consumer generation
conflict
because of the instance_uuid consumer generation. [2]
Confirming a migration fails as the deletion of the source host
allocation
fails due to the consumer generation conflict of the migration_uuid
consumer
that is being emptied.[3]
During scheduling of a new server putting allocation to instance_uuid
fails as
the scheduler assumes that it is a new consumer and therefore uses
consumer_generation: None for the allocation, but placement reports
generation
conflict. [4]
During a non-forced evacuation the scheduler tries to claim the
resource on the
destination host with the instance_uuid, but that consumer already
holds the
source allocation therefore the scheduler cannot assume that the
instance_uuid
is a new consumer. [4]
Cheers,
gibi
[1] https://review.openstack.org/#/c/591597
[2] https://review.openstack.org/#/c/591810
[3] https://review.openstack.org/#/c/591811
[4] https://review.openstack.org/#/c/583667
More information about the OpenStack-dev
mailing list