[openstack-dev] [qa][all] Branchless Tempest beyond pure-API tests, impact on backporting policy

Matthew Treinish mtreinish at kortar.org
Wed Jul 9 22:10:31 UTC 2014


On Wed, Jul 09, 2014 at 01:44:33PM -0400, Eoghan Glynn wrote:
> 
> Thanks for the response Matt, some comments inline.
> 
> > > At the project/release status meeting yesterday[1], I raised the issue
> > > that featureful backports to stable are beginning to show up[2], purely
> > > to facilitate branchless Tempest. We had a useful exchange of views on
> > > IRC but ran out of time, so this thread is intended to capture and
> > > complete the discussion.
> > 
> > So, [2] is definitely not something that should be backported.
> 
> Agreed. It was the avoidance of such forced backports that motivated
> the thread.
> 
> > But, doesn't it mean that cinder snapshot notifications don't work
> > at all in icehouse?
> 
> The snapshot notifications work in the sense that cinder emits them
> at the appropriate points in time. What was missing in Icehouse is
> that ceilometer didn't consume those notifications and translate to
> metering data.

Yeah that's what I meant, sorry it wasn't worded clearly.

> 
> > Is this reflected in the release notes or docs somewhere
> 
> Yeah it should be clear from the list of meters in the icehouse docs:
> 
>   https://github.com/openstack/ceilometer/blob/stable/icehouse/doc/source/measurements.rst#volume-cinder
> 
> versus the Juno version:
> 
>   https://github.com/openstack/ceilometer/blob/master/doc/source/measurements.rst#volume-cinder
> 
> > because it seems like something that would be expected to work, which,
> > I think, is actually a bigger bug being exposed by branchless tempest.
> 
> The bigger bug being the lack of ceilometer support for consuming this
> notification, or the lack of discoverability for that feature?

Well if it's in the docs as a limitation for icehouse, then its less severe
but it's still something on the discoverability side I guess. I think my lack
of experience with ceilometer is showing here.

> 
> > As a user how do I know that with the cloud I'm using whether cinder
> > snapshot notifications are supported?
> 
> If you depend on this as a front-end user, then you'd have to read
> the documentation listing the meters being gathered.
> 
> But is this something that a front-end cloud user would actually be
> directly concerned about?

I'm not sure, probably not, but if it's exposed on the public api I think it's
totally fair to expect that someone will be depending on it.

>  
> > >  * Tempest has an implicit bent towards pure API tests, yet not all
> > >    interactions between OpenStack services that we want to test are
> > >    mediated by APIs
> > 
> > I think this is the bigger issue. If there is cross-service communication it
> > should have an API contract. (and probably be directly tested too) It doesn't
> > necessarily have to be a REST API, although in most cases that's easier. This
> > is probably something for the TC to discuss/mandate, though.
> 
> As I said at the PTLs meeting yesterday, I think we need to be wary
> of the temptation to bend the problem-space to fit the solution.
> 
> Short of the polling load imposed by ceilometer significantly increasing,
> in reality we will have to continue to depend on notifications as one
> of the main ways of detecting "phase-shifts" in resource state.

No, I agree notifications make a lot of sense, the load from frequent polling is
too high.

> 
> Note that the notifications that capture these resource state transitions
> are a long-standing mechanism in openstack that ceilometer has depended
> on from the very outset. I don't think it's realistic to envisage these
> interactions will be replaced by REST APIs any time soon.

I wasn't advocating doing everything over a REST API. (API is an overloaded term)
I just meant that if we're depending on notifications for communication between
projects then we should enforce a stability contract on them. Similar to what we
already have with the API stability guidelines for the REST APIs. The fact that
there is no direct enforcement on notifications, either through social policy or
testing, is what I was taking issue with.

I also think if we decide to have a policy of enforcing notification stability
then we should directly test the notifications from an external repo to block
slips. But, that's a discussion for later, if at all.

> 
> > > The branchless Tempest spec envisages new features will be added and
> > > need to be skipped when testing stable/previous, but IIUC requires
> > > that the presence of new behaviors is externally discoverable[5].
> > 
> > I think the test case you proposed is fine. I know some people will
> > argue that it is expanding the scope of tempest to include more
> > whitebox like testing, because the notification are an internal
> > side-effect of the api call, but I don't see it that way. It feels
> > more like exactly what tempest is there to enable testing, a
> > cross-project interaction using the api.
> 
> In my example, APIs are only used to initiate the action in cinder
> and then to check the metering data in ceilometer.
> 
> But the middle-piece, i.e. the interaction between cinder & ceilometer,
> is not mediated by an API. Rather, its carried via an unversioned
> notification.

Yeah, exactly, that's why I feel it's a valid Tempest test case. 

What I was referring to as the counter argument, and where the difference of
opinion was, is that the test will be making REST API calls to both trigger a
nominally internal mechanism (the notification) from the services and then using
the ceilometer api to validate the notification worked. But, it's arguably the
real intent of these tests is to validate that internal mechanism, which is
basically a whitebox test. The argument was that by testing it in tempes we're
testing notifications poorly because of it's black box limitation notifications
will just be tested indirectly. Which I feel is a valid point, but not a
sufficient reason to exclude the notification tests from tempest.

I think the best way to move forward is to have functional whitebox tests for
the notifications as part of the individual projects generating them, and that
way we can direct validation of the notification. But, I also feel there should
be tempest tests on top of that that verify the ceilometer side of consuming the
notification and the api exposing that information.

> 
> > I'm pretty sure that most of the concerns around tests like this
> > were from the gate maintenance and debug side of things. In other
> > words when things go wrong how impossible will it be to debug that a
> > notification wasn't generated or not counted? Right now I think it
> > would be pretty difficult to debug a notification test failure,
> > which is where the problem is. While I think testing like this is
> > definitely valid, that doesn't mean we should rush in a bunch of
> > sloppy tests that are impossible to debug, because that'll just make
> > everyone sad panda.
> 
> It's a fair point that cross-service diagnosis is not necessarily easy,
> especially as there's pressure to reduce the volume of debug logging
> emitted. But notification-driven metering is an important part of what
> ceilometer does, so we need to figure out some way of integration-testing
> it, IMO.

I actually think Sean's proposal in his response to the OP addresses some
alternative approaches.

> 
> > But, they're is also a slight misunderstanding here. Having a
> > feature be externally discoverable isn't a hard requirement for a
> > config option in tempest, it's just *strongly* recommended. Mostly,
> > because if there isn't a way to discover it how are end users
> > expected to know what will work.
> 
> A-ha, I missed the subtle distinction there and thought that this
> discoverability was a *strict* requirement. So how bad a citizen would
> a project be considered to be if it chose not to meet that strong
> recommendation?

You'd be far from the only ones who are doing that, for an existing example
look at anything on the nova driver feature matrix. Most of those aren't
discoverable from the API. So I think it would be ok to do that, but when we
have efforts like:

https://review.openstack.org/#/c/94473/

it'll make that more difficult. Which is why I think having discoverability
through the API is important. (it's the same public cloud question)

> 
> > For this specific case I think it's definitely fair to have an
> > option for which notifications services are expected to be
> > generated. That's something that is definitely a configurable option
> > when setting up a deployment, and is something that feels like a
> > valid tempest config option, so we know which tests will work. We
> > already have similar feature flags for config time options in the
> > services, and having options like that would also get you out of
> > that backport mess you have right now.
> 
> So would this test configuration option would have a semantic like:
> 
>  "a wildcarded list of notification event types that ceilometer consumes"
> 
> then tests could be skipped on the basis of the notifications that
> they depend on being unavailable, in the manner of say:
> 
>   @testtools.skipUnless(
>       matchesAll(CONF.telemetry_consumed_notifications.volume,
>                  ['snapshot.exists',
>                   'snapshot.create.*',
>                   'snapshot.delete.*',
>                   'snapshot.resize.*',]
>                 )
>   )
>   @test.services('volume')
>   def test_check_volume_notification(self):
i>       ...
> 
> Is something of that ilk what you envisaged above?

Yeah that was my idea more or less, but I'd probably move the logic into a separate
decorator to make a bit cleaner. Like:

@test.consumed_notifications('volumes', 'snapshot.exists', 'snapshot.create.*',
                             'snapshot.delete.*', 'snapshot.resize.*')

and you can just double them up if the test requires notifications from other
services.

>
> 
> > However, it does raise the question of being an end user how am I
> > expected to know which notifications get counted? Which is why having
> > the feature discoverability is generally a really good idea.
> 
> So certain things we could potentially make discoverable through the
> ceilometer capabilities API. But there's a limit to how fine-grained
> we can make that. Also it was primarily intended to surface lack of
> feature-parity in the storage driver layer (e.g. one driver supports
> sdtdev, but another doesn't) as opposed to the notification-handling
> layer.

I think we should probably decouple the discoverability question from the
discussion about testing, because to a certain degree it's a separate problem.
It might be worth splitting the API discoverability discussion off as a separate
thread so we don't cloud either topic. (pun intended :) )

I guess it really depends on how we expect people to be consuming the API, and
whether we're presenting that expectation to people. I'll have to admit I'm not
as familiar with ceilometer as I should be, which is probably where some of my
confusion is coming from.

The naive real world example I have in my head is: someone wants to build a tool
that will generate pretty pictures from data queried from the ceilometer API,
and if they want to run this on 2+ distinct OpenStack deployments how will they
know only through the API which notifications are getting counted, etc. If it's
doable through the API then the tool can adjust what gets generated based on an
API call, (either up front or with an expected error response, like unsupported
request. (of course a new response like unsupported would probably mean a new
API rev) Without the discoverability manual intervention would be required each
time it's used with a new cloud. This probably isn't a valid use case and I'm
missing something critical here. In which case the discoverability discussion is
moot.

>  
> > > One approach mooted for allowing these kind of scenarios to be tested
> > > was to split off the pure-API aspects of Tempest so that it can be used
> > > for probing public-cloud-capabilities as well as upstream CI, and then
> > > build project-specific mini-Tempests to test integration with other
> > > projects.
> > > 
> > > Personally, I'm not a fan of that approach as it would require a lot
> > > of QA expertise in each project, lead to inefficient use of CI
> > > nodepool resources to run all the mini-Tempests, and probably lead to
> > > a divergent hotchpotch of per-project approaches.
> > 
> > I think the proposal here was for people interested in doing
> > whitebox testing, where there is a desire to test an internal
> > project mechanism. I could see the argument for testing
> > notifications this way, but that would have to be for every project
> > individually. There are already several projects that have
> > functional testing like this in tree and run them as a gating
> > job. There are definitely certain classes of testing where doing
> > this makes sense.
> 
> I'm not sure that this would be realistic to test individually (if by
> that you meant just with the ceilometer agents running alone) as it
> depends on a notification emitted from cinder.

I wasn't actually referring to use it for this specific case, just a general
thought. Although maybe it would be useful to add to add notification functional
tests to Cinder. I think Sean's response on the OP outlined some interesting
alternative testing strategies which could would definitely fit this use case. I
still need to finish forming my opinions about that before I respond to it.

<snip> 

> > I think what we're
> > hitting here is more a matter of projects trying to map out exactly
> > how to test things for real in the gate with tempest. While at the
> > same time coming to understand that things don't quite work as well
> > as we expected. I think that we have to remember that this is the
> > first cycle with branchless tempest it's still new for everyone and
> > what we're hitting here are just some of the growing pains around
> > it.  Having discussions like this and mapping out the requirements
> > more completely is th
> 
> Yep, we're all learning and beginning to see the for-real implications
> of branchless Tempest.
> 

Heh, I guess I forgot to finish that sentence. :) 

> > I recognize that for projects that didn't have any real testing
> > before we started branchless tempest it's harder to get things going
> > with it.  Especially because in my experience the adage "if it isn't
> > tested it's broken" tends to hold true. So I expect there will be a
> > lot of non-backportable fixes just to enable testing. What this
> > friction with branchless tempest is showing us is that these fixes,
> > besides fixing the bug, will also have implications for people using
> > OpenStack clouds. Which I feel is invaluable information to collect,
> > and definitely something we should gate on. The open question is how
> > do we make it easier the to enable testing for new things.
> 
> Yes, ceilometer unfortunately falls somewhat into that category of not
> having much pre-existing Tempest coverage. We had a lot of Tempest tests
> proposed during Icehouse, but much of it stalled around performance
> issues in our sql-alchemy driver.

Yeah that's what I was referring to. The fact that all the ramp up is happening
here means you guys are first to hit a lot of these issues first.


So I wrote this in one pass, sorry if bits of it are incoherent. :)

-Matt Treinish
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140709/71b1f022/attachment.pgp>


More information about the OpenStack-dev mailing list