[openstack-dev] [tripleo] Adding new roles after upgrade is broken.

Jiří Stránský jistr at redhat.com
Fri Aug 18 13:22:27 UTC 2017


On 18.8.2017 13:18, Sofer Athlan-Guyot wrote:
> Hi,
> 
> We may have missing packages when the user is adding a new role to its
> roles_data file and the base image is coming from previous version.
> 
> The workflow would be this one:
>   - install newton
>   - upgrade to ocata
>   - add collectd to roles_data and redeploy the stack
> 
> For instance if one is adding
> OS::TripleO::Services::Collectdservices::collectd in an ocata env coming
> from an upgraded newton env, he/she won't have the necessary packages
> (for instance collectd-disk).  The puppet manifest will fail has the
> package is missing and puppet doesn't install package.  The upgrade
> task[1] is useless as the new role wasn't added during the upgrade but
> after.

Right, but the package could be added during the upgrade. The 
upgrade_tasks could/should make the set of installed overcloud RPMs on 
par with the overcloud-full image of the respective release, ideally. So 
you'd have collectd RPMs installed always, both on freshly deployed and 
upgraded envs, regardless if you actually use collectd or not. We 
already did some package installs/uninstalls as part of upgrades and 
updates, but probably didn't have 100% coverage.

> 
> I don't see any easy way to solve this.  Basically we need a way to keep
> in sync base image between release without using the upgrade_tasks,
> maybe in the tripleo-package one ?

Given that released code is affected, we may treat it as a bug that 
requires a minor update, and in addition to upgrade_tasks, we can add 
all the necessary package installs into minor update code 
(yum_update.sh) too. Again this shouldn't depend on what services are 
actually enabled, just unconditionally sync with latest content of 
overcloud-full image of the respective release.

I guess the time consuming part will be preparing the envs that will 
allow comparing a fresh deploy vs. an upgraded one to get the `rpm -qa | 
sort` difference. Or we could try a shortcut and see what changes went 
into tripleo-puppet-elements in each release.

> 
> This shouldn't be a problem with container, but everything before pike
> is affected.

Indeed. There will still be some basic baremetal host content management 
as long as we're not using Atomic, but the room for potential problems 
will be much smaller.

Jirka

> 
> Originially seen there[2]
> 
> [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/ocata/puppet/services/metrics/collectd.yaml#L130..L134
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1455065
> 




More information about the OpenStack-dev mailing list