[openstack-dev] [nova] Plans to fix numa_topology related issues with migration/resize/evacuate
Barton.Wensley at windriver.com
Wed Mar 4 15:17:24 UTC 2015
I have been exercising the numa topology related features in kilo (cpu
pinning, numa topology, huge pages) and have seen that there are issues
when an operation moves an instance between compute nodes. In summary,
the numa_topology is not recalculated for the destination node, which
results in the instance running with the wrong topology (or even
failing to run if the topology isn't supported on the destination).
This impacts live migration, cold migration, resize and evacuate.
I have spent some time over the last couple weeks and have a working
fix for these issues that I would like to push upstream. The fix for
cold migration and resize is the most straightfoward, so I plan to
At a high level, here is what I have done to fix cold migrate and
- Add the source_numa_topology and dest_numa_topology to the migration
object and migrations table.
- When a resize_claim is done, store the claimed numa topology in the
dest_numa_topology in the migration record. Also store the current
numa topology as the source_numa_topology in the migration record.
- Use the source_numa_topology and dest_numa_topology from the
migration record in the resource accounting when referencing
migration claims as appropriate. This is done for claims, dropped
claims and the resource audit.
- Set the numa_topology in the instance after the cold migration/resize
is finished to the dest_numa_topology from the migration object -
done in finish_resize RPC on the destination compute to match where
the rest of the resources for the instance are updated (there is a
call to _set_instance_info here that sets the memory, vcpus, disk
space, etc... for the migrated instance).
- Set the numa_topology in the instance if the cold migration/resize is
reverted to the source_numa_topology from the migration object -
done in finish_revert_resize RPC on the source compute.
I would appreciate any comments on my approach. I plan to start
submitting the code for this against bug 1417667 - I will split it
into several chunks to make it easier to review.
Fixing live migration was significantly more effort - I'll start a
different thread on that once I have feedback on the above approach.
Bart Wensley, Member of Technical Staff, Wind River
More information about the OpenStack-dev