[openstack-dev] mock 1.3 breaking all of Kilo in Sid (and other cases of this kind)

Robert Collins robertc at robertcollins.net
Tue Aug 25 21:20:17 UTC 2015


I feel your pain and where you are coming from. We're all of us trying
to make things better in the world - OpenStack, Debian, Python and so
on. And I understand the limitations that Debian has vs the upstream
Python ecosystem, and the challenges you face working in that
environment.

But having said that, the tone and content of your email felt quite
accusatory to me and thats unnecessary - it is to be blunt, nasty.

We can do much better than that when resolving conflicts. I would
greatly appreciate it if, should future conflicts come up, that you do
so.

I have seen your follow up email, but since you've raised the points
here I feel compelled to proffer a different explanation of the issues
you're reporting.

On 26 August 2015 at 01:42, Thomas Goirand <zigo at debian.org> wrote:
> Hi,
>
> This is a special message for Robert Collins, as I believe he's the one
> responsible for the breakage. If it's not your fault, then I'm sorry,
> and whoever does the breakage should read what's below carefully, so
> that it doesn't happen again.

meta:
----
I find it hard to read emails 'written specially to me because I broke
something' in a calm and dispassionate manner. Whether I did it or
not, it predisposes me to defensiveness and raises my heart rate and
blood pressure. E.g. adrenaline.

I think for my own health I'm going to add a rule to killfile such
mailers in the future: life is too short. If someone wants to do a
postmortem on something I'm involved in - great. If they decide to
open with such a biased, blame based approach, then I'm not
interested.
-----

Ok, so onto the body.

The mock API was broken by changes to the copy in the CPython standard
library during 3.4 and 3.5. I and others worked hard to remediate the
feature limits introduced by those changes when they were discovered
by the backporting process to 'mock' in 1.1 and above.

mock 1.1 was a minor version change rather than a major version change
because at the time of the initial sync I did not realise how
widespread the impact of the changes from the stdlb would be. I had
personally reviewed them all, including test changes, and none seemed
contentious. I was wrong. The Python users of the internet have
already told me this in technicolour, with diagrams. However, having
incurred that cost, we haven't had any /new/ gratuitous
incompatibilities added, and we've rolled back the really big ones -
by fixing the stdlib library to make it better. Except for the bad
assert detection one - see below.

> Robert, while I do appreciate all of your work, and your technically
> sound contributions, I am having a hard time with your habit to
> regularly break backward AND forward API compatibility. Yes, sometimes
> we unfortunately must do it. But this should be a very rare exception,
> and you've been doing it over and over again, making package
> maintainer's life miserable.

Mock has been an exceptional case in my experience. But where else
have I done this?

Backwards compat is 'deal with older inputs', which pbr does just fine
for *defined inputs*. Mock 1.3 is also backwards compatible with
*defined older inputs*. The one case where its not, assert methods on
mocks that were not defined, has been hugely contentious, possibly
burnt out a cPython code dev (who quite literally said 'I'm outta
here' in the > 100 message mailing thread about it), and has been
widely *welcomed* by users because it finds actual genuine bugs in
their test suite.

Forwards compat is 'deal with newer inputs gracefully'. Both pbr and
mock accept newer inputs gracefully: they error and callers can use
that to detect an old version and provide whatsoever fallback they
like. Just like handling of epoll on Linux versions that don't have
it.

> This first happened with PBR. Kilo can't use >= 1.x

This is due to Kilo having *inappropriate* version caps on its
dependencies. Which we've been busy unwinding and fixing
infrastructure this cycle to avoid having it happen again on Liberty.
The errors from projects in kilo running with pbr >= 1.x are due to
pkg_resources entry_points validating the declared dependencies from
the package, and the packages having a pbr<1 *defensive dependency*.
This is not recognised as a mistaken pattern - see the requirements
management spec where we're trying to avoid it.

pbr's Python API and packaging behaviour is entirely compatible with
all of kilo. The things that pbr 0.11 accepts and pbr 1.0+ doesn't is
an empty set to the best of my knowledge.

> , and Liberty can't
> use <= 1.x.

This is because pbr 1.x offers features that Liberty needs. Thats how
software moves forward: you add the feature, and someone else uses it
and declares a dependency on your version.

> So I can't upload PBR 1.3.0 to Sid. This has been dealt with
> because I am the maintainer of PBR, but really, it shouldn't have
> happen. How come for years, upgrading PBR always worked, and suddenly,
> when you start contributing to it, it breaks backward compat? I'm having
> a hard time to understand what's the need to break something which
> worked perfectly for so long. I'd appreciate more details.

More of the ad hominens. Wow. Here's the details. I've been
contributing to it for half the life of the project: since August
2013. When did you notice this break? Oh right, mid 2015. So I've been
contributing to this for a full half of the life of the project
without this drama occuring until just now.

As I say above, its not a PBR problem. Its badly expressed defensive
dependencies in kilo's runtime requirements. Fix that, and kilo will
be happy with newer pbr. The exact same issue will arise by the way
when you start updating /all/ of the olso libraries for Liberty. There
is no version overlap between Kilo servers and Liberty oslo libraries
that satisfies the Python dependencies. And this is why we've spent
the last cycle overhauling our supporting infrastructure around this.
To solve the exact same problem with library versions, of which the
pbr cap is just one example.

> But for mock, that's another story. I'm not the maintainer, and the one
> who is, decided it was a good moment to upload to Sid. The result is 9
> FTBFS (failures to build from source) so far, because mock >= 1.1 is
> incompatible with Kilo (but does work well with Liberty, which
> *requires* it).

Yes, Liberty requires it because we're porting to Python3.4 and up,
and mock < 1.1 is incompatible with Python3.4.
...
> Clearly, we're not alone using mock. And we should always consider that
> we aren't alone. So the usual "yeah, but we have pinned the versions, so
> it's Debian's fault to have uploaded version 1.3 in Sid" would be very
> naive in this case, and absolutely not valid. This is an ok-ish answer
> for OpenStack only components like Oslo libraries. And even so, I'm
> convince that we shouldn't break APIs there either.

Yes, mock is a widely used library. Most of the rdepends will be build
time, so your report is showing less than actually are affected.

> So the issue here, really, is backward and forward compatibility
> breakage in mock. Robert, you're a DD and you've been working for
> Canonical, so you must know about these. You just need to care more for
> this type of things. In the Linux kernel development space, they *never*
> break userland as a rule. Why are Python developers allowing themselves
> to do so?

The incompatible changes to mock were originated by other people, in
the cPython repository. I'm not going to second guess their work -
they're all good competent people. The net result was a few small and
easily fixed incompatibilies, a couple that weren't so easy, and one
big glaring one which we've kept because it massively improves the
API. (erroring on non-existing assert methods).

As I described above, I ported those changes to mock, with Michael
Foord's blessing, to get them out for folk to use, and I found out
afterwards about the impact. Sean Dague was /much/ more polite about
suggesting I do more integration testing for mock - and I agree that
we should - I'd love a patch to .travis.yml to test run Nova's unit
tests on mock backports (or better still one for pybots to test stdlib
commits!) There have been no backwards incompatible changes that I've
intentionally created in mock, and when I review code I look for
backward incompatibility all the time.

> Worse case if we really want to break things: isn't there ways to keep
> the old API and write a new one, let everyone migrate, then eventually
> deprecate the old one?

Often yes. If you know you're breaking things.

> Anyway, the result is that mock 1.3 broke 9 packages at least in Kilo,
> currently in Sid [1]. Maybe, as packages gets rebuilt, I'll get more bug
> reports. This really, is a depressing situation. Now, as the package
> maintainer for the failed packages, I have 4 solutions:
>
> 1/ Reassign these bugs to python-mock.
> 2/ Remove all of the unit tests which are currently failing because of
> the new python-mock version. This isn't great, but as I already ran
> these tests with mock 1.0.1, it should be ok.
> 3/ Completely remove unit tests for these Kilo packages (or at least
> allow them to fail).
> 4/ See what's been done in Liberty to fix these tests with the newer
> version of mock, and backport that to Kilo.

5/ update OpenStack in unstable to be Liberty

6/ Build something in Debian to deal with  conflicting APIs of Python
packages - we can do it with C ABIs (and do, all the time), but
there's no infrastructure for doing it with Python. If we had that
then Debian Python maintainers could treat this as a graceful
transition rather than an awkward big-bang.

> In the case of 1/, I don't think the python-mock package maintainer will
> be able to do anything about it, and eventually, python-mock will get
> AUTORM from Debian testing, which doesn't help me at all.
>
> Unfortunately, 4/ isn't practical, because I'm also maintaining
> backports to Jessie, which means I'd have to write fixes so that the
> packages would work for both mock 1.0.1 and 1.3, plus it would take a
> very large amount of my time in a non-useful way (I know the package
> works as it passed unit tests with 1.0.1, so just fixing the tests is
> useless).

One can't actually know that. Because one of the bugs in 1.0.1 is that
many assertions appear to work even though they don't exist: tests are
ilently broken with mock 1.0.1.

> So I'm left with either option 2/ and 3/. But really, I'd have preferred
> if mock didn't break things... :/
>
> Now, the most annoying one is with testtools (ie: #796542). I'd
> appreciate having help on that one.

Twisted's latest releases moved a private symbol that testtools
unfortunately depends on.
https://github.com/testing-cabal/testtools/pull/149 - and I just
noticed now that Colin has added the test matrix we need, so we can
merge this and get a release out this week.

> I hope the message is heard and that it wont happen again.

I certainly hope we won't have an email thread like this again :)

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list