[openstack-dev] [all] fix latency on requirements breakage

Sean Dague sean at dague.net
Mon Nov 17 23:02:54 UTC 2014


As we're dealing with the fact that testtools 1.4.0 apparently broke
something with attribute additions to tests (needed by tempest for
filtering), it raises an interesting problem.

Our current policy on requirements is to leave them open ended, this
lets us take upstream fixes. It also breaks us a lot. But our max
version of dependencies happens with 0 code review or testing.

However, fixing these things takes a bunch of debug, code review, and
test time. Seen by the fact that the testtools 1.2.0 block didn't even
manage to fully merge this weekend.

This is an asymetric break/fix path, which I think we need a better plan
for. If fixing is more expensive than breaking, then you'll tend to be
in a broken state quite a bit. We really actually want the other
asymetry if we can get it.

There are a couple of things we could try here:

== Cap all requirements, require code reviews to bump maximums ==

Benefits, protected from upstream breaks.

Down sides, requires active energy to move forward. The SQLA 0.8
transition took forever.

== Provide Requirements core push authority ==

For blocks on bad versions, if we had a fast path to just merge know
breaks, we could right ourselves quicker. It would have reasonably
strict rules, like could only be used to block individual versions.
Probably that should also come with sending email to the dev list any
time such a thing happened.

Benefits, fast to fix

Down sides, bypasses our testing infrastructure. Though realistically
the break bypassed it as well.

...

There are probably other ways to make this more symetric. I had a grand
vision one time of building a system that kind of automated the
requirements bump, but have other problems I think need to be addressed
in OpenStack.


The reason I think it's important to come up with a better way here is
that making our whole code gating system lock up for 12+ hrs because of
an external dependency that we are pretty sure is the crux of our break
becomes very discouraging for developers. They can't get their code
merged. They can't get accurate test results. It means that once we get
the fix done, everyone is rechecking their code, so now everyone is
waiting extra long for valid test results. People don't realize their
code can't pass and just keep pushing patches up consuming resources
which means that parts of the project that could pass tests, is backed
up behind 100% guarunteed failing parts. All in all, not a great system.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list