[Openstack-operators] How do you even test for that?

Matt Fischer matt at mattfischer.com
Mon Oct 17 23:45:07 UTC 2016


This does not cover all your issues but after seeing mysql bugs between I
and J and also J to K we now export and restore production control plane
data into a dev environment to test the upgrades. If we have issues we
destroy this environment and run it again.

For longer running instances that's tough but we try to catch those in our
shared dev environment or staging with regression tests. This is also where
we catch issues with outside hardware interactions like load balancers and
storage.

For your other issue was there a warning or depreciation in the logs for
that? That's always at the top of our checklist.

On Oct 17, 2016 12:51 PM, "Jonathan Proulx" <jon at csail.mit.edu> wrote:

> Hi All,
>
> Just on the other side of a Kilo->Mitaka upgrade (with a very brief
> transit through Liberty in the middle).
>
> As usual I've caught a few problems in production that I have no idea
> how I could possibly have tested for because they relate to older
> running instances and some remnants of older package versions on the
> production side which wouldn't have existed in test unless I'd
> installed the test server with Havana and done incremental upgrades
> starting a fairly wide suite of test instances along the way.
>
> First thing that bit me was neutron-db-manage being confused because
> my production system still had migrations from Havana hanging around.
> I'm calling this a packaging bug
> https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1633576 but I
> also feel like remembering release names forever might be a good
> thing.
>
> Later I discovered during the Juno release (maybe earlier ones too)
> making snapshot of running instances populated the snapshot's meta
> data with "instance_type_vcpu_weight: none".  Currently (Mitaka) this
> value must be an integer if it is set or boot fails.  This has the
> interesting side effect of putting your instance into shutdown/error
> state if you try a hard reboot of a formerly working instance.  I
> 'fixed' this manually frobbing the DB to set lines where
> instance_type_vcpu_weight was set to none to be deleted.
>
> Does anyone have strategies on how to actually test for problems with
> "old" artifacts like these?
>
> Yes having things running from 18-24month old snapshots is "bad" and
> yes it would be cleaner to install a fresh control plane at each
> upgrade and cut over rather than doing an actual in place upgrade.  But
> neither of these sub-optimal patterns are going all the way away
> anytime soon.
>
> -Jon
>
> --
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20161017/d2ae767b/attachment.html>


More information about the OpenStack-operators mailing list