[openstack-dev] [all][python3] use of six.iteritems()

Robert Collins robertc at robertcollins.net
Thu Jun 11 05:16:54 UTC 2015


On 11 June 2015 at 15:48, Dolph Mathews <dolph.mathews at gmail.com> wrote:
> tl;dr .iteritems() is faster and more memory efficient than .items() in
> python2
>
>
> Using xrange() in python2 instead of range() because it's more memory
> efficient and consistent between python 2 and 3...
>
> # xrange() + .items()
> python -m timeit -n 20 for\ i\ in\
> dict(enumerate(xrange(1000000))).items():\ pass
> 20 loops, best of 3: 729 msec per loop
> peak memory usage: 203 megabytes

This test conflates setup and execution. Better like my example,
because otherwise you're not testing iteritems vs items, you're
testing dictionary creation time; likewise memory pressure. Your times
are meaningless as it stands.

To test memory pressure, don't use timeit. Just use the interpreter.
$ python
Python 2.7.8 (default, Oct 20 2014, 15:05:19)
[GCC 4.9.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d = dict(enumerate(range(1000000)))
>>> import os
>>> os.getpid()
28345
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
28345 robertc   20   0  127260 104568   4744 S   0.0  0.6   0:00.17
python

>>> i = d.items()

28345 robertc   20   0  206524 183560   4744 S   0.0  1.1   0:00.59
python

183560-104568=80M to hold a reference to all 1 million items, which
indeed is not as efficient as python3. So *IF* we had a million item
dict, and absolutely nothing else around, we should care.

But again - where in OpenStack does this matter the slightest?

No one has disputed that they are different. The assertion that it
matters is what is out of line with our reality.

10000 items:
28399 robertc   20   0   31404   8480   4612 S   0.0  0.1   0:00.01
python
28399 robertc   20   0   32172   9268   4612 S   0.0  0.1   0:00.01
python
9268-8489=0.8M which is indeed 2 orders of magnitude less. And I'd
STILL challenge anyone to find a place where 10000 items are being
passed around within OpenStack's components without it being a bug
today.

Optimising away under a M of data when we shouldn't have that many
rows/items/whatever in memory in the first place is just entirely
missing the point of programming in Python.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list