[nova] track error migrations and orphans in Resource Tracker
Luyao Zhong
luyao.zhong at intel.com
Thu Nov 14 02:33:18 UTC 2019
On 2019/11/14 上午3:43, melanie witt wrote:
> On 11/12/19 05:18, Sean Mooney wrote:
>> On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote:
>>> Hi Nova experts,
>>>
>>> "Not tracking error migrations and orphans in RT." is probably a bug.
>>> This may trigger some problems in
>>> update_available_resources in RT at the moment. That is some orphans
>>> or error migrations are using cpus/memory/disk
>>> etc, but we don't take these usage into consideration. And
>>> instance.resources is introduced from Train used to contain
>>> specific resources, we also track assigned specific resources in RT
>>> based on tracked migrations and instances. So this
>>> bug will also affect the specific resources tracking.
>>>
>>> I draft an doc to clarify this bug and possible solutions:
>>> https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT
>>> Looking forward to suggestions from you. Thanks in advance.
>>>
>> there are patche up to allow cleaning up orpahn instances
>> https://review.opendev.org/#/c/627765/
>> https://review.opendev.org/#/c/648912/
>> if we can get those merged that woudl adress at least some of the
>> proablem
>
> I just wanted to mention:
>
> I have reviewed the cleanup patches ^ multiple times and I'm having a
> hard time getting past the fact that any way you slice it (AFAICT), the
> cleanup code will have a window where a valid guest could be destroyed
> erroneously (not an orphan). This is because the "get instance list by
> host" can miss instances that are mid-migration, because of how/where we
> update the instance.host field.
>
> Maybe this ^ could be acceptable (?) if we put a big fat warning on the
> config option help for 'reap_unknown'. But I was unsure of the answers
> about what recovery looks like in case a guest is erroneously destroyed
> for an instance that is in the middle of migrating. In the case of
> resize or cold migrate, a hard reboot would fix it AFAIK. What about for
> a live migration? If recovery is possible in every case, those would
> also need to be documented in the config option help for 'reap_unknown'.
>
> The patch has lots of complexities to think about and I'm left wondering
> if the pitfalls are better or worse than the current state. It would
> help if others joined in the review with their thoughts about it.
>
> -melanie
Hi Sean Mooney and melanir, thanks for mentioning.
This ^ is for cleanup orphans. For imcomplete migations, you prefer not
destroying them, right? I'm not sure about it either. But I gave a
possible solution on the etherpad (set instance.host and apply/revert
migration context and then invoke cleanup_running_deleted_instances to
cleanup the instance).
And before cleanup done, we need track these instances/migrations in RT,
need more people join our discussion. Welcome put your suggestion on the
etherpad. https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT.
Thanks in advance.
BR,
Luyao
More information about the openstack-discuss
mailing list