On Wed, 12 Dec 2018, Matt Riedemann wrote:
I wanted to send this separate from the latest gate status update [1] since it's primarily about latent cinder bugs causing failures in the gate for which no one is really investigating.
Thanks for writing up this and the other message [1]. It provides much more visibility and context over the situation and hopefully can stimulate people to think about making fixes and perhaps changes to some of the ways we do things that aren't always working. In that spirit...
Regarding the backup tests specifically, I don't see a reason why they need to be run in the integrated gate jobs, e.g. tempest-full(-py3). They don't involve other services, so in my opinion we should move the backup tests to a separate job which only runs on cinder changes to alleviate these latent bugs failing jobs for unrelated changes and resetting the entire gate.
I guess in this case these tests were exposed by their failing, and it was only once investigating that you realized they weren't truly integration tests? Have you, Matt, got any ideas on how to find other non-integration tests that are being treated as integration which we could move to their own things? De-tangling the spaghetti is likely to reveal plenty of improvements but also plenty of areas that need more attention. A couple of things I've been working on lately that might be useful for refining tests, such that we catch failures in check before the integrated gate: * In nova, we're in the process of removing placement, but continuing to use a real, instead of fake, placement fixture in the functional tests [2][3]. This is relatively straightforward for placement, since it is just a simple wsgi app, but it might be possible to create similar things for Cinder and other services, so that functional tests that are currently using a faked out stub of an alien service can use something a bit more robust. If people are interested in trying to make that happen, I'd be happy to help make it go but I wouldn't be able until next year. * In placement we wanted to do some very simple live performance testing but didn't want to pay the time cost of setting up a devstack or tempest node, so did something much more lightweight [4] which takes about 1/3rd or less of the time. This may be a repeatable pattern for other kinds of testing. Often times devstack and tempest are overkill but we default to them because they are there. And finally, the use of wsgi-intercept[5] (usually with gabbi [6]), has made it possible to have reasonably high confidence of the behavior of the placement and nova APIs via functional tests, catching issues before anything as costly as tempest gets involved. Any service which presents its API as a WSGI should be able to use the same tooling if they want. If anyone is curious about any of this stuff, please feel free to ask. [1] http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000867.... [2] https://review.openstack.org/#/c/617941/ [3] https://git.openstack.org/cgit/openstack/placement/tree/placement/tests/func... [4] https://review.openstack.org/#/c/619248/ [5] https://pypi.org/project/wsgi_intercept/ [6] https://gabbi.readthedocs.io -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent