[Openstack-operators] [Nova] Significance of Error Vs Failed status

Kekane, Abhishek Abhishek.Kekane at nttdata.com
Mon May 9 01:52:08 UTC 2016

Hi All,

In Liberty release, we had upstream [1] a security fix to cleanup orphaned instance files from compute nodes for resize operation. To fix this security issue, a new periodic task '_cleanup_incomplete_migrations’ was introduced that runs on each compute node which queries for deleted instances and migration status in “error” status. If there are any such instances, then it simply cleanup instance files on that particular compute node.
Similar issue is reported in LP bug [2] for Live-migration operation and we would like to use the same periodic task to fix this problem. But in case of live migration, the migration status is set to “failed” instead of “error” status if migration fails for any reason. This change was introduced in patch [3] when migration object support was added for live migration. Due to this inconsistency, the periodic task will not pickup instances to cleanup orphaned instance files. To fix this problem, we simply want to set the migration status to “error” in patch [4] same as done for resize operation to bring consistency to the code.
We have discussed about this issue in the nova meeting [5] and decided that to the client, migration status 'error' vs. 'failed' should be considered the same thing, it's a failure. From operators point of view, is there any significance of setting migration status to 'error' or 'failed', if yes what is it and what impact it will have if migration status is changed from 'failed' to 'error'. Please provide your opinions on the same.

[1] https://review.openstack.org/#/c/219299
[2] : https://bugs.launchpad.net/nova/+bug/1470420
[3] https://review.openstack.org/#/c/183331
[4] https://review.openstack.org/#/c/215483
[5] http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2016-05-05.log.html#t2016-05-05T14:40:51
Thank You,


