[nova][ops][stable] Any interest in backporting --dry-run and/or --instance options for heal_allocations?
Matt Riedemann
mriedemos at gmail.com
Tue Nov 5 16:45:52 UTC 2019
I was helping someone recover from a stuck live migration today where
the migration record was stuck in pre-migrating status and somehow the
request never hit the compute or was lost. The guest was stopped on the
guest and basically the live migration either never started or never
completed properly (maybe rabbit dropped the request or the compute
service was restarted, I don't know).
I instructed them to update the database to set the migration record
status to 'error' and hard reboot the instance to get it running again.
Then they pointed out they were seeing this in the compute logs:
"There are allocations remaining against the source host that might need
to be removed"
That's because the source node allocations are still tracked in
placement by the migration record and the dest node allocations are
tracked by the instance. Cleaning that up is non-trivial. I have a
troubleshooting doc started for manually cleaning up that kind of stuff
here [1] but ultimately just told them to delete the allocations in
placement for both the migration and the instance and then run the
heal_allocations command to recreate the allocations for the instance.
Since this person's nova deployment was running Stein, they don't have
the --dry-run [2] or --instance [3] options for the heal_allocations
command. This isn't a huge problem but it does mean they could be
healing allocations for instances they didn't expect.
They could work around this by installing nova from train or master in a
VM/container/virtual environment and running it against the stein setup,
but that's maybe more work than they want to do.
The question I'm posing is if people would like to see those options
backported to stein and if so, would the stable team be OK with it? I'd
say this falls into a gray area where these are things that are
optional, not used by default, and are operational tooling so less risk
to backport, but it's not zero risk. It's also worth noting that when I
wrote those patches I did so with the intent that people could backport
them at least internally.
[1] https://review.opendev.org/#/c/691427/
[2] https://review.opendev.org/#/c/651932/
[3] https://review.opendev.org/#/c/651945/
--
Thanks,
Matt
More information about the openstack-discuss
mailing list