[openstack-dev] Please do *NOT* use "vendorized" versions of anything (here: glanceclient using requests.packages.urllib3)

Thomas Goirand zigo at debian.org
Sun Sep 21 12:21:57 UTC 2014


Hi Ian and Donald,

I've read the full thread, and couldn't help to reply to it still, even
though I previously thought I shouldn't, as what I care is OpenStack,
not really requests, and more largely, the topic of the wrong reasons
why upstream are embedding foreign library code copies. I completely
agree with someone else who wrote that this thread is nearly
uninteresting for OpenStack itself. However, it is IMO my role, as a
package maintainer, to let you know about my view on your argumentation.

If you ignore my argumentation, then at least I'll have tried! :)

On 09/18/2014 03:42 AM, Ian Cordasco wrote:
> Circling back to the issue of vendoring though: it’s a conscious decision
> to do this, and in the last two years there have been 2 CVEs reported for
> requests. There have been none for urllib3 and none for chardet. (Frankly
> I don’t think either urllib3 or chardet have had any CVEs reported against
> them, but let’s ignore that for now.) While security is typically the
> chief concern with vendoring, none of the libraries we use have had
> security issues rendering it a moot point in my opinion. The benefits of
> vendoring for us as a team have been numerous and we will likely continue
> to do it until it stops benefiting us and our users.

Could you please list the benefits *for end users*? I'm really saying
users, as in, not developers. Because I don't see any benefit at all for
the end users. I don't think any of them would like to see many version
of the same thing on their system.

Also, the issue is not only security. Let me give you an example. Simply
do this in a Debian sid machine:

apt-file search six.py | grep -v python3 | grep -v pyshared | wc -l

We have in Debian, about 50 versions of six.py around, embedded in
packages. And this doesn't even counts those where only bits of six are
just embedded in a file which isn't called six.py.

Of course, we (in Debian) would like them to be removed. Why? Because
it's a useless complexity, with so many different version, some with
embedded bugs which have been fixed upstream, and the like. That's not
even about security at this point (I hardly would see how six.py would
have security issue).

There is also a waste of server resources (install time, size of
packages in the Debian archive, increased download time, RAM footprint
of everything, etc.). We don't need to install (and compile as .pyc at
install time) 50 versions of six.py, a single one is enough. We're
trying to address this as much as we can, and you'll see lots of
packages were we did, but it's not always easy for various reasons, like
upstream code not up-to-date with latest version, or lack of time from
the Debian package maintainer.

Also, consider the fact that six is small: a quite small single file.
It's still unacceptable from a Unix distribution stand point, but this
makes the "vendorizing" less of an absurdity. Now, for urllib3, it's a
WAY bigger. There's about 25 Python files. So multiply the resulting
waste and issues...

This was a simple example for six. Now just generalize to all. There's
numerous upstream authors who also think that it's ok, and they can be
one of the few exception. But really, every upstream who does this think
that he's "special". That's not the case. Requests isn't more special
than any other Python module.

On 09/18/2014 04:31 AM, Ian Cordasco wrote:
> Isn’t the whole point of distributing a library to let people use it
> as they see fit?

The point of a library, is that it is shared among multiple consumers.
Oh, maybe not if you're using Windows, but that's maybe out of the scope
of this debate. Maintaining a coherent distribution with a single
version of every library, is what distributions do as much as possible.
It is unfortunately not always possible, but we do it as much as we can.

On 09/18/2014 04:31 AM, Ian Cordasco wrote:
> Project X pins a version of requests. Alice doesn’t know anything
> about requests and does pip install X. Until Alice takes a more
> active role in the development of Project X and looks into requests,
> she will never know she’s installed software that has exposures in
> it. In all likelihood, any person who just uses something that pins
> requests will never check for it. If they just use pip and Project X
> never updates, it’s not our responsibility for anything that happens
> to the user.

This is exactly why we should, at all costs, avoid using pinning. This
is very dangerous, and leads to all sorts of issues. We should make sure
that we stay current with absolutely all libraries, and when possible,
support both the old and the new version of some incompatible API when
possible.

On 09/18/2014 04:31 AM, Ian Cordasco wrote:
> I think more applications bundle it than you realize. You’re likely
> using one daily that does it.

It's not because stupidity is wide spread that it becomes smartness.
(nothing personal, just making a general statement...)

On 09/18/2014 04:31 AM, Ian Cordasco wrote:
> The reality is that by vendoring its dependencies, requests allows
> its users more flexibility than other projects.

How should I put it in simple words ... hum ... Oh, I know:

*NO* !!! THAT'S WRONG !!!

This adds complexity, because for someone who wants sanity and
de-vendorize urllib3 from requests, then it's very annoying. And that's
not counting on issues regarding urllib3 removed from the Debian/Ubuntu
requests package. Also, because of what we do in distributions (eg:
removing embedded copies), then you get all sorts of errors and mistakes
like this one:
https://github.com/cdent/python3-wsgi-intercept/issues/24

This can happen again with glanceclient.

> Even if we didn’t,
> users would still likely find ways to vendor requests and its
> dependencies for their own use and in doing so would have to modify
> requests to rewrite the import statements to point at those vendored
> dependencies.

Let me put this strait: you're vendorizing, because you want to make it
easy for others to vendorize. This is circular / recursive thinking, and
isn't a valid point of argumentation.

Now, let's generalize anyway. How about we completely remove the concept
of python module, Unix distributions and package, or even pip install
stuff, and just ask everyone to embed copies of everything? Wouldn't it
make it more easy? Obviously, since you're a smart person, you will
agree it wont be the case, and it will increase complexity. Why then do
you think you're different from the general use case, and think you need
to be an exception? It doesn't make sense.

> he fact is that vendoring is a real solution and it’s deployed more
> often than you likely realize. It benefits our project and it
> benefits our users.

I can make bold statements too, but it doesn't help understanding each
other. For example:

Vendoring is a real problem, and it's deployed a way more than I would
like, it poisons projects adds bugs, security issues, and at the end,
our end user is the one who will be the looser.

Please reconsider your argumentation. It is my view that the only
problem that solves embedding code copies is making *your* life simpler,
and probably the one of *some minority* project authors, but that's not
the big picture at all.

On 09/18/2014 07:58 PM, Donald Stufft wrote:
> Distributions are not the only place that people get their software
> from, unless you think that the ~3 million downloads requests has
> received on PyPI in the last 30 days are distributions downloading
> requests to package in their OSs.

Out of which how many are testings for OpenStack? I'd be very careful
with this kind of statistics.

On 09/18/2014 09:10 PM, Donald Stufft wrote:
> If distributions are going to modify one upstream project they
> should expect to need to modify things that depend on that project in
> ways that are sensitive to what they've modified.

We do expect things to break yes. This doesn't mean we are happy to deal
with breakage. Of course, we prefer things that are simple, with less
risks for breakage.

> The only real sane thing IMO is for openstack to consider requests as
> it is on PyPI. If openstack wants to make it easier for downstream to
> de-vendor urllib3 from requests then when openstack wants to
> import from requests.packages.* it can instead do:
>
>     try:
>         from requests.packages import urllib3
>     except ImportError:
>         import urllib3

This is a *very bad* idea. Why? Because the system version and the
embedded requests version may be different. This means that we may see a
bug only in some cases. The way you did, the distribution will be the
looser, as the unit test in the OpenStack gate will be the "wrong" one.

Do you want a concrete example of breakage? Here's one:
https://github.com/cdent/python3-wsgi-intercept/issues/24

So by all means, if we have to do something of that sort, it would
rather be to try to never use requests.packages.* at all in our
OpenStack code.

By the way, what makes you think we should prefer the requests version
of urllib3 rather than the one upstream?

On 09/18/2014 10:30 PM, Donald Stufft wrote:
> This leaves Openstack with a few reasonable/sane options:
>
> 1) Decide that vendoring in requests is unacceptable to what
>    Openstack as a project is willing to support, and cease the use of
>    requests.
> 2) Decide that what requests offers is good enough that it outweighs
>    the fact that it vendors urllib3 and continue using it.

3) Convince upstream to restore sanity and stop embedding foreign libs.
This would be the best outcome ever for everyone!

4) as said Clint, we can always fork, even though it's preferable to
collaborate. Forks are nearly always a waste for everyone.

On 09/19/2014 01:33 AM, Ian Cordasco wrote:
> Given requests’ download count, I have to doubt that OpenStack users
> constitute the masses in this case.

How many of these are due to unit tests? 3 millions download in one
month looks like really a lot compared to Debian popcon stats:

https://qa.debian.org/popcon.php?package=requests

You may as well consider that there's 76 reverse dependencies of
python-requests in Debian (as per "apt-rdepends -r python-requests"),
which is also a big count of users who aren't using the embedded version
of urllib3. Or maybe you will make the point that you don't care about
these? Or that it's not your responsibility? I hope not, I hope that you
do care about every consumer of your code, (yes, really all of them)!

On 09/18/2014 10:35 PM, Ian Cordasco wrote:
> Except that even OpenStack doesn’t pin requests because of how
> extraordinarily stable our API is. While you can argue that Kenneth
> has non-standard opinions about his library, Cory and I take backwards
> compatibility and stability very seriously. This means anyone can
> upgrade to a newer version of requests without worrying that it will
> be backwards incompatible.

That's very good, thanks for taking care of this. However, there's
nothing "extraordinarily" here. Think about this fact: the Linux kernel
is *always* backward compatible. I don't think you'll argue that
requests is more complex than the kernel.

On 09/18/2014 10:20 PM, Ian Cordasco wrote:
>Thomas Goirand wrote:
>> The main issue is that urllib3 in requests, as other pointed out, is
>> not up-to-date, and will not be updated. In fact, that's the main
>> reason why the upstream authors of requests are vendorizing: it's
>> because they
>> don't want to carry the burden of staying up-to-date.
> How involved are you with requests’ development process? You must not
> be very involved because this is the exact opposite reason of why we
> vendor. More often that not we pull in urllib3 to get unreleased
> features that we build upon for our newest release.

I'm only involved in the packaging of OpenStack, not upstream code. The
only thing I do in upstream code, is fixing issues like the one we're
talking about (embedding other libs which are already in the Debian
archive).

I don't really care if you update "often", the issue is that your
version may just be "different", which potentially create issues. It
simply should not. If urllib3 doesn't have the features you need, then
work with upstream to fix it. Otherwise declare that you're forking the
library and rename it as something like urllib3-request-fork or
something. There's nothing bad with forking if upstream isn't
collaborative enough to accept your needs.

I hope I've been convincing,
Cheers,

Thomas Goirand (zigo)




More information about the OpenStack-dev mailing list