[openstack-dev] [qa] [ceilometer] [swift] tempests tests, grenade, old branches

Chris Dent chdent at redhat.com
Mon Sep 1 18:33:10 UTC 2014


I've got a review in progress for adding a telemetry scenario test:

     https://review.openstack.org/#/c/115971/

It can't pass the *-icehouse tests because ceilometer-api is not present
on the icehouse side of a havana->icehouse upgrade.

In the process of trying to figure out what's going on I discovered
so many confusing things that I'm no longer clear on:

* Whether this is a fixable problem?
* Whether it is worth fixing?
* How (or if) it is possible to disable the test in question for
   older branches?
* Maybe I should scrap the whole thing?[1]

The core problem is that older branches of grenade do not have an
upgrade-ceilometer, so though some ceilometer services do run in
Havana they are not restarted over the upgrade gap.

Presumably that could be fixed by backporting some stuff to the
relevant branch. I admit, though, that at times it can be rather
hard to tell which branch during a grenade run is providing the
configuration and environment variables. In part this is due to an
apparent difference in default local behavior and gate behavior.
Suppose I wanted to exactly what replicate on a local setup what
happens on a gate run, where do I go to figure that out?

That seems a bit fragile, though. Wouldn't it be better to upgrade
services based on what services are actually running, rather than
some lines in a shell script?

I looked into how this might be done and the mapping from
ENABLED_SERVICES to actually-running-processes to
some-generic-name-to-identify-an-upgrade is not at all
straightforward. I suspect this is a known problem that people would
like to fix, but I don't know where to look for more discussion on
the matter. Please help?

[1] And finally, the other grenade runs, those that are currently
passing are only passing because a very long loop is waiting up to
two minutes for notification messages (from the middleware) to show
up at the ceilometer collector. Is this because the instance is just
that overloaded and process contention is so high and it is just
going to take that long? Is so, is there much point having a test
which introduces this kind of potential legacy. A scenario test
appears to be exactly what's needed here, but at what cost?

What I'm after here is basically threefold:

* Pointers to written info on how I can resolve these issues, if it
   exists.
* If it doesn't, some discussion here on options to reach some
   resolution.
* A cup of tea or other beverage of our choice and some sympathy
   and commiseration. A bit of "I too have suffered at the hands of
   grenade". Then we can all be friends.

>From my side I can provide a promise to follow through on
improvements we discover.

-- 
Chris Dent tw:@anticdent freenode:cdent
https://tank.peermore.com/tanks/cdent



More information about the OpenStack-dev mailing list