[Openstack-operators] Upgrade Strategy from Essex to Folsom+

Jonathan Proulx jon at jonproulx.com
Mon Jan 7 18:56:55 UTC 2013


On Mon, Jan 7, 2013 at 12:03 PM, Christian Parpart <trapni at gmail.com> wrote:

>
> Hey Jon,
> That actually sounds great, really, so what would you suggest, is it
> possible to first _just_ upgrade the nova-compute nodes (which only have
> nova-compute on it, so no nova-api nor nova-network, just nova-compute) and
> let these then-soon-to-be folsom nova-compute nodes talk to the
> still-unchanged Essex controller and network node(s) -- or the other way
> around ?
>
>
What I did was the other way around.  My setup is of similar size a single
"controller" node running database,rpc, api , etc... a single nova-volume
node and about 45 compute nodes (at the time) running nova-compute and
nova-network (since in multi-host each node handles it's own network for
instances running on it).  My cloud at the time was "alpha" to a select
group of internal users so was willing and able to take some risks which
modulo some since fixed bugs worked for me, though I do wish my notes were
better...

My definition of "live"  was instances stay running but API can go away.
If you can live with that limited version of "live" this should work, but
think hard about my logic this is mostly from memory and I might forget
something....first some general notes then, my order of operations.

I am also using puppet as a configuration management system (and hadn't
properly upgraded that to folsom when I did the upgrade) so I'm not sure if
the packages fix some of the issues I had with config file formats or if
you'll need to manage them by hand as well, if you're using a configuration
management system I believe they've all caught up with Folsom now so this
should be less of an issue.

Generically for all services the paste.ini files have some new pieces in
them that you need, I found it best to accept the new package version then
check for any changes made locally.  Also Essex nova.conf used the old
"--option-name" lines with Folsom you need to loose the dashes and have
section headers so.

Old Style:
--multi_host=True
--state_path=/var/lib/nova
--public_interface=eth0

New Style:
[DEFAULT]
multi_host=True
state_path=/var/lib/nova
public_interface=eth0


Since the database schema changes I started by upgrading controller node
which host the database server in my world.  This was the most nail biting
part for me.  Stopped all OpenStack and Database services, made an lvm
snapshot so I could roll back if things when sour and then  updated the
sources.list.d to include the could archives and installed the new Folsom
bits.

I would think that the packages would handle this on upgrade but my notes
say I needed update the databases to newest format by  running:

keystone-manage db_sync
nova-manage db sync

(love that slight syntax difference), obviously your DB server needs to be
running to do this so perhaps that's why I need to do so by hand.

Checked the config files as noted above and made sure all the services
would start, then took down the API service and Dashboard so users could
make state changes while we're in an indeterminate state.

Once I had the controller updated and had appropriately tweaked my puppet
config to handout Folsom flavored configs the compute and volume nodes just
needed the new packages installed and a configuration management run.

info on switching from nova-volume to cinder (which must be done after the
Folsom upgrade in any case) is at
http://wiki.openstack.org/MigrateToCinderand was much simpler than the
overall Folsom upgrade, what notes I did need
to make on that I added to that wiki page (or it predecessor in the release
notes).

Caveats: I did hit a few since fixed upgrade/transition bugs since I
upgraded so soon after Folsom came out so I'm not 100% sure I have 100% of
the steps needed.  If you are in a really live production environment I
would strongly recommend building a small test environment with at least
one node of each type you'll be upgrading in the same Essex install as what
you currently have and doing a dry run.  Though even with bugs, debugging,
and a bit of hand fixing database tables muddled by the since fixed bugs I
didn't loose any running instances though I had the API down for about 72
hours.  Without the bugs probably would have been more like 4hrs, though
I'd schedule a 12hr window if I were to do it again, and on a production
system would definitely spin up some test systems even if they were virtual
instances on the current system to do a final step through and sanity test.

Hope that helps, do have current backups, and do re think through that plan
yourself!

-Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20130107/fc945e35/attachment.html>


More information about the OpenStack-operators mailing list