[openstack-dev] [Nova] os-migrateLive not working with neutron in Havana (or apparently Grizzly)

John Garbutt john at johngarbutt.com
Wed Feb 5 10:42:57 UTC 2014


On 4 February 2014 19:16, Jonathan Proulx <jon at jonproulx.com> wrote:
> HI all,
>
> Trying to get a little love on bug https://bugs.launchpad.net/nova/+bug/1227836
>
> Short version is the instance migrates, but there's an RPC time out
> that keeps nova thinking it's still on the old node mid-migration.
> Informal survey of operators seems to suggest this always happens when
> using neutron networking and never when using nova-networking (for
> small values of always and never)
>
> Feels like I could kludge in a longer timeout somewhere and it would
> work for now, so I'm sifting through unfamiliar code trying to find
> that and hoping someone here just knows where it is and can make my
> week a whole lot better by pointing it out.

Seems like it is this call that times out:
https://github.com/openstack/nova/blob/master/nova/conductor/rpcapi.py#L428
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4283

And because there is no wrapper on this manager call method, it
remains in the "Migrating" task state:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4192

> Better less kludgy solutions also welcomed, but I need a kernel update
> on all my compute nodes so quick and dirty is all I need for right
> now.

I have some draft patches for a longer term fix as part of this:
https://blueprints.launchpad.net/nova/+spec/live-migration-to-conductor

In my current patches, I don't remove all the call operations, but
that seems like a good eventual goal.

Basic idea, is imagine the current flow is:
* source compute node calls destination
* source compute node calls conductor to do stuff
* source compute node completes rest of work

Possible new flow, removing all calls:
* conductor casts to destination
* destination casts to conductor
* conductor does what it needs to do
* conductor casts to source
* source casts to conductor
* conductor finishes off
* maybe have a periodic task to spot when we get stuck waiting (to
replace RPC timeout)

John



More information about the OpenStack-dev mailing list