[nova] track error migrations and orphans in Resource Tracker
melanie witt
melwittt at gmail.com
Wed Nov 13 19:43:56 UTC 2019
On 11/12/19 05:18, Sean Mooney wrote:
> On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote:
>> Hi Nova experts,
>>
>> "Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in
>> update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk
>> etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain
>> specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this
>> bug will also affect the specific resources tracking.
>>
>> I draft an doc to clarify this bug and possible solutions:
>> https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT
>> Looking forward to suggestions from you. Thanks in advance.
>>
> there are patche up to allow cleaning up orpahn instances
> https://review.opendev.org/#/c/627765/
> https://review.opendev.org/#/c/648912/
> if we can get those merged that woudl adress at least some of the proablem
I just wanted to mention:
I have reviewed the cleanup patches ^ multiple times and I'm having a
hard time getting past the fact that any way you slice it (AFAICT), the
cleanup code will have a window where a valid guest could be destroyed
erroneously (not an orphan). This is because the "get instance list by
host" can miss instances that are mid-migration, because of how/where we
update the instance.host field.
Maybe this ^ could be acceptable (?) if we put a big fat warning on the
config option help for 'reap_unknown'. But I was unsure of the answers
about what recovery looks like in case a guest is erroneously destroyed
for an instance that is in the middle of migrating. In the case of
resize or cold migrate, a hard reboot would fix it AFAIK. What about for
a live migration? If recovery is possible in every case, those would
also need to be documented in the config option help for 'reap_unknown'.
The patch has lots of complexities to think about and I'm left wondering
if the pitfalls are better or worse than the current state. It would
help if others joined in the review with their thoughts about it.
-melanie
More information about the openstack-discuss
mailing list