[Openstack-operators] How do you even test for that?

Jonathan Proulx jon at csail.mit.edu
Mon Oct 17 18:49:13 UTC 2016


Hi All,

Just on the other side of a Kilo->Mitaka upgrade (with a very brief
transit through Liberty in the middle).

As usual I've caught a few problems in production that I have no idea
how I could possibly have tested for because they relate to older
running instances and some remnants of older package versions on the
production side which wouldn't have existed in test unless I'd
installed the test server with Havana and done incremental upgrades
starting a fairly wide suite of test instances along the way.

First thing that bit me was neutron-db-manage being confused because
my production system still had migrations from Havana hanging around.
I'm calling this a packaging bug
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1633576 but I
also feel like remembering release names forever might be a good
thing.

Later I discovered during the Juno release (maybe earlier ones too)
making snapshot of running instances populated the snapshot's meta
data with "instance_type_vcpu_weight: none".  Currently (Mitaka) this
value must be an integer if it is set or boot fails.  This has the
interesting side effect of putting your instance into shutdown/error
state if you try a hard reboot of a formerly working instance.  I
'fixed' this manually frobbing the DB to set lines where
instance_type_vcpu_weight was set to none to be deleted.

Does anyone have strategies on how to actually test for problems with
"old" artifacts like these?

Yes having things running from 18-24month old snapshots is "bad" and
yes it would be cleaner to install a fresh control plane at each
upgrade and cut over rather than doing an actual in place upgrade.  But
neither of these sub-optimal patterns are going all the way away
anytime soon.

-Jon

-- 



More information about the OpenStack-operators mailing list