Openstack nova live migration pain

Sean Mooney smooney at redhat.com
Mon Sep 13 10:30:05 UTC 2021


On Sun, 2021-09-12 at 21:12 +0200, Thomas Goirand wrote:
> On 9/11/21 8:17 PM, hai wu wrote:
> > Is there any way to verify (not to change anything) that, in
> > order to see the current state (like issuing some sql query to see
> > what's in place for its db schema, and issuing some dpkg command to
> > see what kind of openstack related packages are currently in place ..
> > )?
> 
> Some more familiar than me with the nova db schema should be able to
> answer this question. However...
> 
> > I am hoping that we might already have db schema for Train release,
> > and we just need to upgrade some python3 packages from
> > buster-train-backports .. I understand that some db schema might come
> > from certain packages, and if those packages are not matching the ones
> > from buster-train-backports, then we might have to go the db-sync path
> > later.
> 
> ... it doesn't really mater, the db-sync thingy is supposed to be
> idempotent, so you can:
> - upgrade to stein
> - run the db-sync in stein
> - upgrade to train
> - run the db-sync in train
> 
> If you've already done the Train db-sync, then the above db-sync will do
> nothing and that's it... you'll ave performed a working upgrade anyways.
yep running db sync should be safe if you have already run it.

also nova technially does not support mix version with greate then 1 upstream version.

e.g. running some nova rocky compents with other using train.

you can if you really know what your doing make this work in some cases but its not advised
and entirely untested upstream. if the rpc version are pinned correctly on all nodes and the contolers
are a newer version the the compute and all contolers are the same version it technially can fucntion correctly.
but each contoller must run exactly the same version and some feature like the numa live migration code will automatically
disabel its self untill all compute services are on train. upgrading compute nodes before contolers is entirely unsupported and
not intended to work. so i would strongly suggest you try and align all host to train and see if that resolve your issue.

the compute agent locking up when live migration happend had 2 cause that i know if in the past
one was a long running io operation that blocked the main thread and the other was due to not proprly proxying all libvirt object
which again caused the main thread to block on a call to libvirt. both were fixed so its proably that your  rocky nodes are just missing
those fixes.
> 
> Cheers,
> 
> Thomas Goirand (zigo)
> 





More information about the openstack-discuss mailing list