[openstack-dev] [nova] how nova should behave when placement returns consumer generation conflict

Eric Fried openstack at fried.cc
Wed Aug 22 13:52:35 UTC 2018


b) sounds the most sane in both cases. I don't like the idea of "your
move operation failed and you have no recourse but to delete your
instance". And automatic retry sounds lovely, but potentially hairy to
implement (and we would need to account for the retries-failed scenario
anyway) so at least initially we should leave that out.

On 08/22/2018 07:55 AM, Balázs Gibizer wrote:
> 
> 
> On Fri, Aug 17, 2018 at 5:40 PM, Eric Fried <openstack at fried.cc> wrote:
>> gibi-
>>
>>>>  - On migration, when we transfer the allocations in either
>>>> direction, a
>>>>  conflict means someone managed to resize (or otherwise change
>>>>  allocations?) since the last time we pulled data. Given the global
>>>> lock
>>>>  in the report client, this should have been tough to do. If it does
>>>>  happen, I would think any retry would need to be done all the way back
>>>>  at the claim, which I imagine is higher up than we should go. So
>>>> again,
>>>>  I think we should fail the migration and make the user retry.
>>>
>>>  Do we want to fail the whole migration or just the migration step (e.g.
>>>  confirm, revert)?
>>>  The later means that failure during confirm or revert would put the
>>>  instance back to VERIFY_RESIZE. While the former would mean that in
>>> case
>>>  of conflict at confirm we try an automatic revert. But for a
>>> conflict at
>>>  revert we can only put the instance to ERROR state.
>>
>> This again should be "impossible" to come across. What would the
>> behavior be if we hit, say, ValueError in this spot?
> 
> I might not totally follow you. I see two options to choose from for the
> revert case:
> 
> a) Allocation manipulation error during revert of a migration causes
> that instance goes to ERROR. -> end user cannot retry the revert the
> instance needs to be deleted.
> 
> b) Allocation manipulation error during revert of a migration causes
> that the instance goes back to VERIFY_RESIZE state. -> end user can
> retry the revert via the API.
> 
> I see three options to choose from for the confirm case:
> 
> a) Allocation manipulation error during confirm of a migration causes
> that instance goes to ERROR. -> end user cannot retry the confirm the
> instance needs to be deleted.
> 
> b) Allocation manipulation error during confirm of a migration causes
> that the instance goes back to VERIFY_RESIZE state. -> end user can
> retry the confirm via the API.
> 
> c) Allocation manipulation error during confirm of a migration causes
> that nova automatically tries to revert the migration. (For failure
> during this revert the same options available as for the generic revert
> case, see above)
> 
> We also need to consider live migration. It is similar in a sense that
> it also use move_allocations. But it is different as the end user
> doesn't explicitly confirm or revert a live migration.
> 
> I'm looking for opinions about which option we should take in each cases.
> 
> gibi
> 
>>
>> -efried
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list