[openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

Balázs Gibizer balazs.gibizer at ericsson.com
Thu Aug 16 11:31:49 UTC 2018


Hi,

tl;dr: To properly use consumer generation (placement 1.28) in Nova we 
need to
decide how to handle consumer generation conflict from Nova perspective:
a) Nova reads the current consumer_generation before the allocation 
update
   operation and use that generation in the allocation update operation.
   If the allocation is changed between the read and the update then 
nova
   fails the server lifecycle operation and let the end user retry it.
b) Like a) but in case of conflict nova blindly retries the 
read-and-update
   operation pair couple of times and if only fails the life cycle 
operation
   if run out of retries.
c) Nova stores its own view of the allocation. When a consumer's 
allocation
   needs to be modified then nova reads the current state of the 
consumer from
   placement. Then nova combines the two allocations to generate the new
   expected consumer state. In case of generation conflict nova retries 
the
   read-combine-update operation triplet.

Which way we should go now?

What should be or long term goal?


Details:

There are plenty of affected lifecycle operations. See the patch series
starting at [1].

For example:

The current patch[1] that handles the delete server case implements 
option b).
It simly reads the current consumer generation from placement and uses 
that to
send a PUT /allocatons/{instance_uuid} with "allocations": {} in its 
body.

Here implementing option c) would mean that during server delete nova 
needs:
1) to compile its own view of the resource need of the server 
(currently based
   on the flavor but in the future based on the attached port's resource
   requests as well)
2) then read the current allocation of the server from placement
3) then subtract the server resource needs from the current allocation 
and
   send the resulting allocation back in the update to placement

In the simple case this subtraction would result in an empty allocation 
sent to
placement. Also in this simple case c) has the same effect as b) 
currently
implementated in [1].

However if somebody outside of nova modifies the allocation of this 
consumer in
a way that nova does not know about such changed resource need then b) 
and c)
will result in different placement state after server delete.

I only know of one example, the change of neutron port's resource 
request while
the port is attached. (Note, it is out of scope in the first step of 
bandwidth
implementation.) In this specific example option c) can work if nova 
re-reads
the port's resource request during delete when recalculates its own 
view of the
server resource needs. But I don't know if every other resource (e.g.
accelerators) used by a server can be / will be handled this way.


Other examples of affected lifecycle operations:

During a server migration moving the source host allocation from the
instance_uuid to a the migration_uuid fails with consumer generation 
conflict
because of the instance_uuid consumer generation. [2]

Confirming a migration fails as the deletion of the source host 
allocation
fails due to the consumer generation conflict of the migration_uuid 
consumer
that is being emptied.[3]

During scheduling of a new server putting allocation to instance_uuid 
fails as
the scheduler assumes that it is a new consumer and therefore uses
consumer_generation: None for the allocation, but placement reports 
generation
conflict. [4]

During a non-forced evacuation the scheduler tries to claim the 
resource on the
destination host with the instance_uuid, but that consumer already 
holds the
source allocation therefore the scheduler cannot assume that the 
instance_uuid
is a new consumer. [4]


Cheers,
gibi

[1] https://review.openstack.org/#/c/591597
[2] https://review.openstack.org/#/c/591810
[3] https://review.openstack.org/#/c/591811
[4] https://review.openstack.org/#/c/583667






More information about the OpenStack-dev mailing list