Open Stack

Wed Jul 9 17:44:33 UTC 2014

Thanks for the response Matt, some comments inline.

> > At the project/release status meeting yesterday[1], I raised the issue
> > that featureful backports to stable are beginning to show up[2], purely
> > to facilitate branchless Tempest. We had a useful exchange of views on
> > IRC but ran out of time, so this thread is intended to capture and
> > complete the discussion.
> 
> So, [2] is definitely not something that should be backported.

Agreed. It was the avoidance of such forced backports that motivated
the thread.

> But, doesn't it mean that cinder snapshot notifications don't work
> at all in icehouse?

The snapshot notifications work in the sense that cinder emits them
at the appropriate points in time. What was missing in Icehouse is
that ceilometer didn't consume those notifications and translate to
metering data.

> Is this reflected in the release notes or docs somewhere

Yeah it should be clear from the list of meters in the icehouse docs:

  https://github.com/openstack/ceilometer/blob/stable/icehouse/doc/source/measurements.rst#volume-cinder

versus the Juno version:

  https://github.com/openstack/ceilometer/blob/master/doc/source/measurements.rst#volume-cinder

> because it seems like something that would be expected to work, which,
> I think, is actually a bigger bug being exposed by branchless tempest.

The bigger bug being the lack of ceilometer support for consuming this
notification, or the lack of discoverability for that feature?

> As a user how do I know that with the cloud I'm using whether cinder
> snapshot notifications are supported?

If you depend on this as a front-end user, then you'd have to read
the documentation listing the meters being gathered.

But is this something that a front-end cloud user would actually be
directly concerned about?

> >  * Tempest has an implicit bent towards pure API tests, yet not all
> >    interactions between OpenStack services that we want to test are
> >    mediated by APIs
> 
> I think this is the bigger issue. If there is cross-service communication it
> should have an API contract. (and probably be directly tested too) It doesn't
> necessarily have to be a REST API, although in most cases that's easier. This
> is probably something for the TC to discuss/mandate, though.

As I said at the PTLs meeting yesterday, I think we need to be wary
of the temptation to bend the problem-space to fit the solution.

Short of the polling load imposed by ceilometer significantly increasing,
in reality we will have to continue to depend on notifications as one
of the main ways of detecting "phase-shifts" in resource state.

Note that the notifications that capture these resource state transitions
are a long-standing mechanism in openstack that ceilometer has depended
on from the very outset. I don't think it's realistic to envisage these
interactions will be replaced by REST APIs any time soon.

> > The branchless Tempest spec envisages new features will be added and
> > need to be skipped when testing stable/previous, but IIUC requires
> > that the presence of new behaviors is externally discoverable[5].
> 
> I think the test case you proposed is fine. I know some people will
> argue that it is expanding the scope of tempest to include more
> whitebox like testing, because the notification are an internal
> side-effect of the api call, but I don't see it that way. It feels
> more like exactly what tempest is there to enable testing, a
> cross-project interaction using the api.

In my example, APIs are only used to initiate the action in cinder
and then to check the metering data in ceilometer.

But the middle-piece, i.e. the interaction between cinder & ceilometer,
is not mediated by an API. Rather, its carried via an unversioned
notification.

> I'm pretty sure that most of the concerns around tests like this
> were from the gate maintenance and debug side of things. In other
> words when things go wrong how impossible will it be to debug that a
> notification wasn't generated or not counted? Right now I think it
> would be pretty difficult to debug a notification test failure,
> which is where the problem is. While I think testing like this is
> definitely valid, that doesn't mean we should rush in a bunch of
> sloppy tests that are impossible to debug, because that'll just make
> everyone sad panda.

It's a fair point that cross-service diagnosis is not necessarily easy,
especially as there's pressure to reduce the volume of debug logging
emitted. But notification-driven metering is an important part of what
ceilometer does, so we need to figure out some way of integration-testing
it, IMO.

> But, they're is also a slight misunderstanding here. Having a
> feature be externally discoverable isn't a hard requirement for a
> config option in tempest, it's just *strongly* recommended. Mostly,
> because if there isn't a way to discover it how are end users
> expected to know what will work.

A-ha, I missed the subtle distinction there and thought that this
discoverability was a *strict* requirement. So how bad a citizen would
a project be considered to be if it chose not to meet that strong
recommendation?

> For this specific case I think it's definitely fair to have an
> option for which notifications services are expected to be
> generated. That's something that is definitely a configurable option
> when setting up a deployment, and is something that feels like a
> valid tempest config option, so we know which tests will work. We
> already have similar feature flags for config time options in the
> services, and having options like that would also get you out of
> that backport mess you have right now.

So would this test configuration option would have a semantic like:

 "a wildcarded list of notification event types that ceilometer consumes"

then tests could be skipped on the basis of the notifications that
they depend on being unavailable, in the manner of say:

  @testtools.skipUnless(
      matchesAll(CONF.telemetry_consumed_notifications.volume,
                 ['snapshot.exists',
                  'snapshot.create.*',
                  'snapshot.delete.*',
                  'snapshot.resize.*',]
                )
  )
  @test.services('volume')
  def test_check_volume_notification(self):
      ...

Is something of that ilk what you envisaged above?

> However, it does raise the question of being an end user how am I
> expected to know which notifications get counted? Which is why having
> the feature discoverability is generally a really good idea.

So certain things we could potentially make discoverable through the
ceilometer capabilities API. But there's a limit to how fine-grained
we can make that. Also it was primarily intended to surface lack of
feature-parity in the storage driver layer (e.g. one driver supports
sdtdev, but another doesn't) as opposed to the notification-handling
layer.

> > One approach mooted for allowing these kind of scenarios to be tested
> > was to split off the pure-API aspects of Tempest so that it can be used
> > for probing public-cloud-capabilities as well as upstream CI, and then
> > build project-specific mini-Tempests to test integration with other
> > projects.
> > 
> > Personally, I'm not a fan of that approach as it would require a lot
> > of QA expertise in each project, lead to inefficient use of CI
> > nodepool resources to run all the mini-Tempests, and probably lead to
> > a divergent hotchpotch of per-project approaches.
> 
> I think the proposal here was for people interested in doing
> whitebox testing, where there is a desire to test an internal
> project mechanism. I could see the argument for testing
> notifications this way, but that would have to be for every project
> individually. There are already several projects that have
> functional testing like this in tree and run them as a gating
> job. There are definitely certain classes of testing where doing
> this makes sense.

I'm not sure that this would be realistic to test individually (if by
that you meant just with the ceilometer agents running alone) as it
depends on a notification emitted from cinder. 

> > Another idea would be to keep all tests in Tempest, while also
> > micro-versioning the services such that tests can be skipped on the
> > basis of whether a particular feature-adding commit is present.
> > 
> > When this micro-versioning can't be discovered by the test (as in the
> > public cloud capabilities probing case), those tests would be skipped
> > anyway.
> 
> Yeah, I'm not a fan of this approach at all. It is just a bad way of
> reimplementing a temporally-aware tempest. But, instead of using
> branches we have arbitrary service versions. It sacrifices all the
> real advantages of having branchless tempest, but adds more
> complexity around the version discovery to all the projects and
> tempest. If we want to revert back to a temporally aware tempest,
> which I don't think we should, then going back the branched model is
> what we should do.

Fair point about it giving us the worst of both worlds. Yeap, scratch
that suggestion.

> > The final, less palatable, approach that occurs to me would be to
> > revert to branchful Tempest.
> 
> So I don't think we're anywhere near this.

Agreed.

> I think what we're
> hitting here is more a matter of projects trying to map out exactly
> how to test things for real in the gate with tempest. While at the
> same time coming to understand that things don't quite work as well
> as we expected. I think that we have to remember that this is the
> first cycle with branchless tempest it's still new for everyone and
> what we're hitting here are just some of the growing pains around
> it.  Having discussions like this and mapping out the requirements
> more completely is th

Yep, we're all learning and beginning to see the for-real implications
of branchless Tempest.

> I recognize that for projects that didn't have any real testing
> before we started branchless tempest it's harder to get things going
> with it.  Especially because in my experience the adage "if it isn't
> tested it's broken" tends to hold true. So I expect there will be a
> lot of non-backportable fixes just to enable testing. What this
> friction with branchless tempest is showing us is that these fixes,
> besides fixing the bug, will also have implications for people using
> OpenStack clouds. Which I feel is invaluable information to collect,
> and definitely something we should gate on. The open question is how
> do we make it easier the to enable testing for new things.

Yes, ceilometer unfortunately falls somewhat into that category of not
having much pre-existing Tempest coverage. We had a lot of Tempest tests
proposed during Icehouse, but much of it stalled around performance
issues in our sql-alchemy driver.

Thanks in any case for the feedback.

Cheers,
Eoghan

Open Stack

[openstack-dev] [qa][all] Branchless Tempest beyond pure-API tests, impact on backporting policy

OpenStack

Community

Documentation

Branding & Legal