[openstack-dev] Please do *NOT* use "vendorized" versions of anything (here: glanceclient using requests.packages.urllib3)

Ian Cordasco ian.cordasco at RACKSPACE.COM
Sun Sep 21 15:30:56 UTC 2014


Hi Thomas,

Several people, many of whom are core contributors to other projects, have
asked that this discussion not be continued in this venue. Discussion of
the decisions of the core-developers of requests are not appropriate for
this list. All three of us have email addresses that you can retrieve from
anywhere you please. There’s a mailing list for request, albeit very
lightly trafficked, and there’s twitter. Further, I’m disappointed that
you felt it appropriate or necessary to result to personal attacks on this
list. At the very least you could have contained those to Twitter like
others in this thread have done. I expected a more civil response on the
openstack-dev mailing list.

Cheers,
Ian

On 9/21/14, 7:21 AM, "Thomas Goirand" <zigo at debian.org> wrote:

>Hi Ian and Donald,
>
>I've read the full thread, and couldn't help to reply to it still, even
>though I previously thought I shouldn't, as what I care is OpenStack,
>not really requests, and more largely, the topic of the wrong reasons
>why upstream are embedding foreign library code copies. I completely
>agree with someone else who wrote that this thread is nearly
>uninteresting for OpenStack itself. However, it is IMO my role, as a
>package maintainer, to let you know about my view on your argumentation.
>
>If you ignore my argumentation, then at least I'll have tried! :)
>
>On 09/18/2014 03:42 AM, Ian Cordasco wrote:
>> Circling back to the issue of vendoring though: it’s a conscious
>>decision
>> to do this, and in the last two years there have been 2 CVEs reported
>>for
>> requests. There have been none for urllib3 and none for chardet.
>>(Frankly
>> I don’t think either urllib3 or chardet have had any CVEs reported
>>against
>> them, but let’s ignore that for now.) While security is typically the
>> chief concern with vendoring, none of the libraries we use have had
>> security issues rendering it a moot point in my opinion. The benefits of
>> vendoring for us as a team have been numerous and we will likely
>>continue
>> to do it until it stops benefiting us and our users.
>
>Could you please list the benefits *for end users*? I'm really saying
>users, as in, not developers. Because I don't see any benefit at all for
>the end users. I don't think any of them would like to see many version
>of the same thing on their system.
>
>Also, the issue is not only security. Let me give you an example. Simply
>do this in a Debian sid machine:
>
>apt-file search six.py | grep -v python3 | grep -v pyshared | wc -l
>
>We have in Debian, about 50 versions of six.py around, embedded in
>packages. And this doesn't even counts those where only bits of six are
>just embedded in a file which isn't called six.py.
>
>Of course, we (in Debian) would like them to be removed. Why? Because
>it's a useless complexity, with so many different version, some with
>embedded bugs which have been fixed upstream, and the like. That's not
>even about security at this point (I hardly would see how six.py would
>have security issue).
>
>There is also a waste of server resources (install time, size of
>packages in the Debian archive, increased download time, RAM footprint
>of everything, etc.). We don't need to install (and compile as .pyc at
>install time) 50 versions of six.py, a single one is enough. We're
>trying to address this as much as we can, and you'll see lots of
>packages were we did, but it's not always easy for various reasons, like
>upstream code not up-to-date with latest version, or lack of time from
>the Debian package maintainer.
>
>Also, consider the fact that six is small: a quite small single file.
>It's still unacceptable from a Unix distribution stand point, but this
>makes the "vendorizing" less of an absurdity. Now, for urllib3, it's a
>WAY bigger. There's about 25 Python files. So multiply the resulting
>waste and issues...
>
>This was a simple example for six. Now just generalize to all. There's
>numerous upstream authors who also think that it's ok, and they can be
>one of the few exception. But really, every upstream who does this think
>that he's "special". That's not the case. Requests isn't more special
>than any other Python module.
>
>On 09/18/2014 04:31 AM, Ian Cordasco wrote:
>> Isn’t the whole point of distributing a library to let people use it
>> as they see fit?
>
>The point of a library, is that it is shared among multiple consumers.
>Oh, maybe not if you're using Windows, but that's maybe out of the scope
>of this debate. Maintaining a coherent distribution with a single
>version of every library, is what distributions do as much as possible.
>It is unfortunately not always possible, but we do it as much as we can.
>
>On 09/18/2014 04:31 AM, Ian Cordasco wrote:
>> Project X pins a version of requests. Alice doesn’t know anything
>> about requests and does pip install X. Until Alice takes a more
>> active role in the development of Project X and looks into requests,
>> she will never know she’s installed software that has exposures in
>> it. In all likelihood, any person who just uses something that pins
>> requests will never check for it. If they just use pip and Project X
>> never updates, it’s not our responsibility for anything that happens
>> to the user.
>
>This is exactly why we should, at all costs, avoid using pinning. This
>is very dangerous, and leads to all sorts of issues. We should make sure
>that we stay current with absolutely all libraries, and when possible,
>support both the old and the new version of some incompatible API when
>possible.
>
>On 09/18/2014 04:31 AM, Ian Cordasco wrote:
>> I think more applications bundle it than you realize. You’re likely
>> using one daily that does it.
>
>It's not because stupidity is wide spread that it becomes smartness.
>(nothing personal, just making a general statement...)
>
>On 09/18/2014 04:31 AM, Ian Cordasco wrote:
>> The reality is that by vendoring its dependencies, requests allows
>> its users more flexibility than other projects.
>
>How should I put it in simple words ... hum ... Oh, I know:
>
>*NO* !!! THAT'S WRONG !!!
>
>This adds complexity, because for someone who wants sanity and
>de-vendorize urllib3 from requests, then it's very annoying. And that's
>not counting on issues regarding urllib3 removed from the Debian/Ubuntu
>requests package. Also, because of what we do in distributions (eg:
>removing embedded copies), then you get all sorts of errors and mistakes
>like this one:
>https://github.com/cdent/python3-wsgi-intercept/issues/24
>
>This can happen again with glanceclient.
>
>> Even if we didn’t,
>> users would still likely find ways to vendor requests and its
>> dependencies for their own use and in doing so would have to modify
>> requests to rewrite the import statements to point at those vendored
>> dependencies.
>
>Let me put this strait: you're vendorizing, because you want to make it
>easy for others to vendorize. This is circular / recursive thinking, and
>isn't a valid point of argumentation.
>
>Now, let's generalize anyway. How about we completely remove the concept
>of python module, Unix distributions and package, or even pip install
>stuff, and just ask everyone to embed copies of everything? Wouldn't it
>make it more easy? Obviously, since you're a smart person, you will
>agree it wont be the case, and it will increase complexity. Why then do
>you think you're different from the general use case, and think you need
>to be an exception? It doesn't make sense.
>
>> he fact is that vendoring is a real solution and it’s deployed more
>> often than you likely realize. It benefits our project and it
>> benefits our users.
>
>I can make bold statements too, but it doesn't help understanding each
>other. For example:
>
>Vendoring is a real problem, and it's deployed a way more than I would
>like, it poisons projects adds bugs, security issues, and at the end,
>our end user is the one who will be the looser.
>
>Please reconsider your argumentation. It is my view that the only
>problem that solves embedding code copies is making *your* life simpler,
>and probably the one of *some minority* project authors, but that's not
>the big picture at all.
>
>On 09/18/2014 07:58 PM, Donald Stufft wrote:
>> Distributions are not the only place that people get their software
>> from, unless you think that the ~3 million downloads requests has
>> received on PyPI in the last 30 days are distributions downloading
>> requests to package in their OSs.
>
>Out of which how many are testings for OpenStack? I'd be very careful
>with this kind of statistics.
>
>On 09/18/2014 09:10 PM, Donald Stufft wrote:
>> If distributions are going to modify one upstream project they
>> should expect to need to modify things that depend on that project in
>> ways that are sensitive to what they've modified.
>
>We do expect things to break yes. This doesn't mean we are happy to deal
>with breakage. Of course, we prefer things that are simple, with less
>risks for breakage.
>
>> The only real sane thing IMO is for openstack to consider requests as
>> it is on PyPI. If openstack wants to make it easier for downstream to
>> de-vendor urllib3 from requests then when openstack wants to
>> import from requests.packages.* it can instead do:
>>
>>     try:
>>         from requests.packages import urllib3
>>     except ImportError:
>>         import urllib3
>
>This is a *very bad* idea. Why? Because the system version and the
>embedded requests version may be different. This means that we may see a
>bug only in some cases. The way you did, the distribution will be the
>looser, as the unit test in the OpenStack gate will be the "wrong" one.
>
>Do you want a concrete example of breakage? Here's one:
>https://github.com/cdent/python3-wsgi-intercept/issues/24
>
>So by all means, if we have to do something of that sort, it would
>rather be to try to never use requests.packages.* at all in our
>OpenStack code.
>
>By the way, what makes you think we should prefer the requests version
>of urllib3 rather than the one upstream?
>
>On 09/18/2014 10:30 PM, Donald Stufft wrote:
>> This leaves Openstack with a few reasonable/sane options:
>>
>> 1) Decide that vendoring in requests is unacceptable to what
>>    Openstack as a project is willing to support, and cease the use of
>>    requests.
>> 2) Decide that what requests offers is good enough that it outweighs
>>    the fact that it vendors urllib3 and continue using it.
>
>3) Convince upstream to restore sanity and stop embedding foreign libs.
>This would be the best outcome ever for everyone!
>
>4) as said Clint, we can always fork, even though it's preferable to
>collaborate. Forks are nearly always a waste for everyone.
>
>On 09/19/2014 01:33 AM, Ian Cordasco wrote:
>> Given requests’ download count, I have to doubt that OpenStack users
>> constitute the masses in this case.
>
>How many of these are due to unit tests? 3 millions download in one
>month looks like really a lot compared to Debian popcon stats:
>
>https://qa.debian.org/popcon.php?package=requests
>
>You may as well consider that there's 76 reverse dependencies of
>python-requests in Debian (as per "apt-rdepends -r python-requests"),
>which is also a big count of users who aren't using the embedded version
>of urllib3. Or maybe you will make the point that you don't care about
>these? Or that it's not your responsibility? I hope not, I hope that you
>do care about every consumer of your code, (yes, really all of them)!
>
>On 09/18/2014 10:35 PM, Ian Cordasco wrote:
>> Except that even OpenStack doesn’t pin requests because of how
>> extraordinarily stable our API is. While you can argue that Kenneth
>> has non-standard opinions about his library, Cory and I take backwards
>> compatibility and stability very seriously. This means anyone can
>> upgrade to a newer version of requests without worrying that it will
>> be backwards incompatible.
>
>That's very good, thanks for taking care of this. However, there's
>nothing "extraordinarily" here. Think about this fact: the Linux kernel
>is *always* backward compatible. I don't think you'll argue that
>requests is more complex than the kernel.
>
>On 09/18/2014 10:20 PM, Ian Cordasco wrote:
>>Thomas Goirand wrote:
>>> The main issue is that urllib3 in requests, as other pointed out, is
>>> not up-to-date, and will not be updated. In fact, that's the main
>>> reason why the upstream authors of requests are vendorizing: it's
>>> because they
>>> don't want to carry the burden of staying up-to-date.
>> How involved are you with requests’ development process? You must not
>> be very involved because this is the exact opposite reason of why we
>> vendor. More often that not we pull in urllib3 to get unreleased
>> features that we build upon for our newest release.
>
>I'm only involved in the packaging of OpenStack, not upstream code. The
>only thing I do in upstream code, is fixing issues like the one we're
>talking about (embedding other libs which are already in the Debian
>archive).
>
>I don't really care if you update "often", the issue is that your
>version may just be "different", which potentially create issues. It
>simply should not. If urllib3 doesn't have the features you need, then
>work with upstream to fix it. Otherwise declare that you're forking the
>library and rename it as something like urllib3-request-fork or
>something. There's nothing bad with forking if upstream isn't
>collaborative enough to accept your needs.
>
>I hope I've been convincing,
>Cheers,
>
>Thomas Goirand (zigo)
>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list