On 2019/11/14 上午3:43, melanie witt wrote:
On 11/12/19 05:18, Sean Mooney wrote:
On Tue, 2019-11-12 at 05:46 +0000, Zhong, Luyao wrote:
Hi Nova experts,
"Not tracking error migrations and orphans in RT." is probably a bug. This may trigger some problems in update_available_resources in RT at the moment. That is some orphans or error migrations are using cpus/memory/disk etc, but we don't take these usage into consideration. And instance.resources is introduced from Train used to contain specific resources, we also track assigned specific resources in RT based on tracked migrations and instances. So this bug will also affect the specific resources tracking.
I draft an doc to clarify this bug and possible solutions: https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT Looking forward to suggestions from you. Thanks in advance.
there are patche up to allow cleaning up orpahn instances https://review.opendev.org/#/c/627765/ https://review.opendev.org/#/c/648912/ if we can get those merged that woudl adress at least some of the proablem
I just wanted to mention:
I have reviewed the cleanup patches ^ multiple times and I'm having a hard time getting past the fact that any way you slice it (AFAICT), the cleanup code will have a window where a valid guest could be destroyed erroneously (not an orphan). This is because the "get instance list by host" can miss instances that are mid-migration, because of how/where we update the instance.host field.
Maybe this ^ could be acceptable (?) if we put a big fat warning on the config option help for 'reap_unknown'. But I was unsure of the answers about what recovery looks like in case a guest is erroneously destroyed for an instance that is in the middle of migrating. In the case of resize or cold migrate, a hard reboot would fix it AFAIK. What about for a live migration? If recovery is possible in every case, those would also need to be documented in the config option help for 'reap_unknown'.
The patch has lots of complexities to think about and I'm left wondering if the pitfalls are better or worse than the current state. It would help if others joined in the review with their thoughts about it.
-melanie
Hi Sean Mooney and melanir, thanks for mentioning. This ^ is for cleanup orphans. For imcomplete migations, you prefer not destroying them, right? I'm not sure about it either. But I gave a possible solution on the etherpad (set instance.host and apply/revert migration context and then invoke cleanup_running_deleted_instances to cleanup the instance). And before cleanup done, we need track these instances/migrations in RT, need more people join our discussion. Welcome put your suggestion on the etherpad. https://etherpad.openstack.org/p/track-err-migr-and-orphans-in-RT. Thanks in advance. BR, Luyao