Open Stack

Tue Aug 28 21:39:09 UTC 2012

Well said, Ryan. Agreed 100% on all points, both in the specific examples and the overarching theme of n+1 compatibility. Upgrade paths have got to be clean and well-documented, and deprecations must be done according to responsible, established timelines from here on out.

We're verifiably doing better between Essex and Folsom, but we still have a LONG way to go to call our upgrade process anything resembling great.

There was talk of trying to set up test infrastructure that would roll out Essex and then upgrade it to Folsom in some automated fashion so we could start learning where it breaks. Was there any forward momentum on that?

All the best,

    - Gabriel

> -----Original Message-----
> From: openstack-bounces+gabriel.hurley=nebula.com at lists.launchpad.net
> [mailto:openstack-
> bounces+gabriel.hurley=nebula.com at lists.launchpad.net] On Behalf Of
> Ryan Lane
> Sent: Tuesday, August 28, 2012 2:26 PM
> To: openstack at lists.launchpad.net
> Subject: [Openstack] A plea from an OpenStack user
> 
> Yesterday I spent the day finally upgrading my nova infrastructure from
> diablo to essex. I've upgraded from bexar to cactus, and cactus to diablo, and
> now diablo to essex. Every single upgrade is becoming more and more
> difficult. It's not getting easier, at all. Here's some of the issues I ran into:
> 
> 1. Glance changed from using image numbers to uuids for images. Nova's
> reference to these weren't updated. There was no automated way to do so.
> I had to map the old values to the new values from glance's database then
> update them in nova.
> 
> 2. Instance hostnames are changed every single release. In bexar and cactus
> it was the ec2 style id. In diablo it was changed and hardcoded to instance-
> <ec2-style-id>. In essex it is hardcoded to the instance name; the instance's
> ID is configurable (with a default of instance-<ec2-style-id>, but it only
> affects the name used in virsh/the filesystem. I put a hack into diablo (thanks
> to Vish for that hack) to fix the naming convention as to not break our
> production deployment, but it only affected the hostnames in the database,
> instances in virsh and on the filesystem were still named instance-<ec2-style-
> id>, so I had to fix all libvirt definitions and rename a ton of files to fix this
> during this upgrade, since our naming convention is the ec2-style format. The
> hostname change still affected our deployment, though. It's hardcoded. I
> decided to simply switch hostnames to the instance name in production,
> since our hostnames are required to be unique globally; however, that
> changes how our puppet infrastructure works too, since the certname is by
> default based on fqdn (I changed this to use the ec2-style id). Small changes
> like this have giant rippling effects in infrastructures.
> 
> 3. There used to be global groups in nova. In keystone there are no global
> groups. This makes performing actions on sets of instances across tenants
> incredibly difficult; for instance, I did an in-place ubuntu upgrade from lucid
> to precise on a compute node, and needed to reboot all instances on that
> host. There's no way to do that without database queries fed into a custom
> script. Also, I have to have a management user added to every single tenant
> and every single tenant-role.
> 
> 4. Keystone's LDAP implementation in stable was broken. It returned no
> roles, many values were hardcoded, etc. The LDAP implementation in nova
> worked, and it looks like its code was simply ignored when auth was moved
> into keystone.
> 
> My plea is for the developers to think about how their changes are going to
> affect production deployments when upgrade time comes.
> 
> It's fine that glance changed its id structure, but the upgrade should have
> handled that. If a user needs to go into the database in their deployment to
> fix your change, it's broken.
> 
> The constant hardcoded hostname changes are totally unacceptable; if you
> change something like this it *must* be configurable, and there should be a
> warning that the default is changing.
> 
> The removal of global groups was a major usability killer for users.
> The removal of the global groups wasn't necessarily the problem, though.
> The problem is that there were no alternative management methods added.
> There's currently no reasonable way to manage the infrastructure.
> 
> I understand that bugs will crop up when a stable branch is released, but the
> LDAP implementation in keystone was missing basic functionality. Keystone
> simply doesn't work without roles. I believe this was likely due to the fact
> that the LDAP backend has basically no tests and that Keystone light was
> rushed in for this release. It's imperative that new required services at least
> handle the functionality they are replacing, when released.
> 
> That said, excluding the above issues, my upgrade went fairly smoothly and
> this release is *way* more stable and performs *way* better, so kudos to
> the community for that. Keep up the good work!
> 
> - Ryan
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

Open Stack

[Openstack] A plea from an OpenStack user

OpenStack

Community

Documentation

Branding & Legal