Open Stack

Mon Aug 27 14:18:29 UTC 2018

Sorry for the delay in responding to this, Gibi and Eric. Comments inline.

tl;dr: go with option a)

On 08/16/2018 11:34 AM, Eric Fried wrote:
> Thanks for this, gibi.
> 
> TL;DR: a).
> 
> I didn't look, but I'm pretty sure we're not caching allocations in the
> report client. Today, nobody outside of nova (specifically the resource
> tracker via the report client) is supposed to be mucking with instance
> allocations, right? And given the global lock in the resource tracker,
> it should be pretty difficult to race e.g. a resize and a delete in any
> meaningful way.

It's not a global (i.e. multi-node) lock. It's a semaphore for just that 
compute node. Migrations (mostly) involve more than one compute node, so 
the compute node semaphore is useless in that regard, thus the need to 
go with option a) and bail out if any change to a generation of any of 
the consumers involved in the migration operation.

> So short term, IMO it is reasonable to treat any generation conflict
> as an error. No retries. Possible wrinkle on delete, where it should
> be a failure unless forced.

Agreed for all migration and deletion operations.

> Long term, I also can't come up with any scenario where it would be
> appropriate to do a narrowly-focused GET+merge/replace+retry. But
> implementing the above short-term plan shouldn't prevent us from adding
> retries for individual scenarios later if we do uncover places where it
> makes sense.

Neither do I. Safety first, IMHO.

Best,
-jay

Open Stack

[openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

OpenStack

Community

Documentation

Branding & Legal